Open Architecture Handbook The Borland Developer's Technical Guide _________________________________________________ BORLAND INTERNATIONAL, INC. 100 BORLAND WAY P.O. BOX 660001 SCOTTS VALLEY, CA 95067-0001 unknown 1 Copyright * 1991, 1993 by Borland International. All rights reserved. All Borland products are trademarks or registered trademarks of Borland International, Inc. Windows, as used in this manual, shall refer to Microsoft's implementation of a windows system. Other brand and product names are trademarks or registered trademarks of their respective holders. PRINTED IN THE USA. R1 10 9 8 7 6 5 4 3 2 1 2 Open Architecture Handbook INTRODUCTION ________________________________________________________________________________ This book presents technical information about several of Borland's language tools, including internal functions implementation details file formats, and other technical specifications It is for advanced users and corporate developers who want to utilize the "behind the scenes" features of Borland's products to develop their own customized tools and environments, and to provide better compatibility with existing code and tools from other vendors. Why open architecture? At the beginning the PC's second decade, one word has captured the spirit and attention of the entire computer industry. It is the word open. Today we hear more and more about open systems, open standards, open tools . . . and open architectures. Along with object-oriented design, the open architecture movement heralds a new era of modular software that is designed to be shareable, extensible and compatible. Just as today's users want a database that integrates smoothly with their word processor, their spreadsheet and their company's mainframe database, so today's software developers demand editors, compilers, debuggers, application frameworks and other tools that they can "mix and match," tools that they can extend or enhance themselves, open development environments that they can customize to work the way they want. The age of closed environments and inaccessible proprietary architectures is coming to an end. With more open and compatible software tools, programmers are better able to create the exciting, reliable and cost-effective software that the nineties will demand. Introduction Page 1 Borland language tools As a leader in the object-oriented design revolution, Borland maintains an unqualified commitment to the open architecture movement. This book, as part of that commitment, provides detailed technical information about the "guts" of Borland's language development tools: internal file formats, compiler implementation details, debugger record structures and much more. This information will enable programmers to extend Borland's tools to meet their own needs, help third-party developers spawn compatible add-on tools, aid software engineers in squeezing out the utmost performance levels from their code by taking advantage of implementation-specific features, and give all programmers greater control and independence over their development environment. How to use this book This book, as befits its subject, is not for the novice user or the technically unsophisticated. Written largely by the Borland developers who actually created the tools described, its style is terse and technical. Every effort has been made to present the topics clearly and in an easy-to-read manner, but the presentation is not a "tutorial," nor are basic concepts of the tools discussed at great length. It is best viewed as a collection of technical papers by developers for developers, presenting hard to find information in a convenient and readily-accessible form. In the chapters which follow, individual specifications will be presented for these Borland tools and standards: Tools discussed C++ object mapping: a detailed description of the Borland C++ implementation's internal strategy for representing objects of various types. Included are the compiler's name mangling rules and discussions of class datas and function members, object initialization, hidden parameters, RTL helper functions, virtual tables and vtable pointers, and dynamically dispatched virtual tables (DDVTs). Object file format: a listing of the structure and content of each type of record emitted by Borland C++ when it produces object files. VIRDEF records: a discussion and format listing of Borland's VIRDEF record type. VIRDEF records are utilized by the linker to support virtual definitions for some C++ types. A VIRDEF record is otherwise similar to a COMDEF record. Symbol table format: presents a brief discussion and layout of the general symbol table which appears at the head of each .EXE file. The symbol table contains TLINK debugger and browser information. Project file format: a detailed layout of the Borland C++ Project file format, used by IDE's Project Make facility. Borland Graphics Interface: describes BGI driver architecture, headers, status and vector tables, structure, and provides a cookbook and examples. Introduction Page 2 ObjectWindows: Borland's ObjectWindows Library (OWL) is a complete application framework for Windows developers. This chapter presents the technical specification for the library, including class structure, protocol and behavior, as well as implementation notes. Borland Windows Custom Controls: presents the technical specifications, usage conventions, and a listing of notification messages in the BWCC custom controls and dialog classes. Borland Help System: defines the Borland Help System, including the source text file format, binary Help file format, and the run-time Help engine. Accompanying software The accompanying Examples and Supplementary Software disk contains a number of brief example programs utilizing the information contained in this book. The examples are referenced in the chapter(s) to which each example applies. A brief disclaimer The information presented in this guide is for the benefit of advanced developers who wish to take advantage of various internal features and formats of the Borland tools. We hope this information is helpful to you and enhances the usefulness of Borland's language development products. Due to its highly technical nature, this material is not documented in the product manuals, and cannot be supported by our customer service staff. Introduction Page 3 Chapter 1 Page 4 CHAPTER ________________________________________________________________________________ 1 C++ object mapping This chapter describes how Turbo C++ and Borland C++ handles memory for C++ objects. The following applies both to the 16-bit (segmented address space) and the 32-bit versions of BCC. Whenever the text has a near or far pointer, this applies to the 16-bit version, and a 32-bit (flat) pointer is to be substituted for the 32-bit version. When the text describes two near and far flavors of the same data structure, a single version using 32-bit flat pointers is to be used for the 32-bit product. Nonstatic data members Borland C++ compilers allocate space for nonstatic data members in order of declaration and regardless of access specifiers. When the word alignment compiler option is turned on, all members larger than 1 byte are aligned on a word boundary (the 32-bit compiler allows alignment on both a multiple of 2 and a multiple of 4 offset, depending on the state of the alignment options and/or the presence of #pragma pack). Nonvirtual base classes Nonvirtual base class members, including compiler defined members, such as vtable pointers, always precede any derived class members, and are allocated in order of declaration, as shown in the following example. Padding is inserted if dictated by the state of compiler alignment options. class B { int b1, b2; }; class D:B { int d; }; The following diagram represents an instance of D: Chapter 1 Page 5 ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ B::b1 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ B::b2 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ D::d ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Virtual base classes At the point where a particular base would occur in an object if the base weren't virtual, which is the case in the previous example, a virtual base class pointer is stored instead, and all of the virtual bases for an object follow all of the nonvirtual bases as well as the derived class, in the order of construction as specified by the language. The virtual base class pointer is always a 16-bit offset pointer, because a class instance can't span a segment boundary. A compiler option offered for backward compatibility with previous releases of Turbo C++ and Borland C++ allows the virtual base class pointer to be either a near or a far pointer, depending on size of the this pointer for that class (16-bit compiler only; with the 32-bit compiler, the virtual base pointers are always flat 32-bit pointers). The compiler will insert a hidden 'unsigned int' (i.e. 16-bit for the 16-bit compiler, 32-bit for the 32-bit compiler) displacement member immediately preceding the virtual base class sub-object when the following conditions exist: a class has either a user-defined constructor or destructor, or both the derived class overrides a virtual function defined in one if its virtual bases The displacement member always equals zero with the following exception, which occurs during construction/destruction of the derived class object: If the derived object is embedded in another class and the virtual base class isn't at the same offset from the derived class as it would be in an object of the derived class, then the displacement member is nonzero. The nonzero displacement member is then used in virtual function thunks to ensure that a correct value is passed to the virtual function for the this parameter. For compatibility with older versions of Turbo C++ and Borland C++, a compiler option disables the addition of the hidden displacement member on a per-class basis. The compiler ensures that the derived class of a virtual base with another virtual base has the 'indirect' virtual base as its virtual base for the following reasons: to represent member pointers capable of pointing to members of virtual base classes in a compact and efficient way to limit the involvement of 'derived*' to 'base*' casts to just one virtual base class pointer indirection Chapter 1 Page 6 The compiler adds such virtual base classes following any user-specified base classes, in the order of construction, but the addition occurs only when the particular virtual base can't already be reached from the derived class through only one level of virtual inheritance. The presence of compiler-added virtual base classes doesn't have side-effects such as changing visibility rules. A compiler-added virtual base class is used for casts of pointers and for pointers of members of the virtual base. The representation of member pointers is discussed later. The following example shows the declaration of the simplest virtual base: class VB { int vb; }; class D:virtual VB { int d; }; The instance of D has the following layout: ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ VB sub-obj ptr ÆÍÍÍÍ» ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º ³ D::d ³ º ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´<ÍÍͼ ³ VB::vb ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The following example shows the declaration of an indirect (or doubly) virtual base: class VB1 { int vb1; }; class VB2 { int vb2; }; class A:virtual VB1 { int a; }; class B:virtual VB2 Chapter 1 Page 7 { int b; }; class C:virtual VB2 { int c; }; class D:virtual A, virtual B, C { int d; }; An instance of D has the following layout: D ÄÄÄÄ> ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ A sub-obj ptr ³ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ>ÍÍÍÍÍÍÍÍÍÍÍ» ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º ³ B sub-obj ptr ³ÍÍÍÍÍÍÍÍÍÍÍÍÍÍ>ÍÍÍÍÍÍÍÍÍÍ» º D::C ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º º ³ VB2 sub-obj ptr ³ÍÍÍÍÍÍÍÍÍÍÍ>ÍÍÍÍÍÍÍ» º º ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º º º ³ C::c ³ º º v ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º v º ***** ÄÄÄÄ> ³ VB1 sub-obj ptr ³ÍÍÍÍÍÍÍ>ÍÍÍÍÍ» v º º ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ º º º º ³ D::d ³ º º º º D::VB1 ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÍÍÍÍÍÍÍÍÍÍͺ º º º ³ VB1::vb1 ³ º º º º D::A ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÍÍÍÍÍÍ<ÍÍÍͺÍÍ<ÍͺÍÍ<ÍͺÍ<ͼ ³ VB1 sub-obj ptr ³ÍÍÍÍÍÍ>ÍÍÍÍÍͼ º º ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ v v ³ A::a ³ º º D::VB2 ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÍÍÍÍÍÍÍÍ<ÍÍÍÍÍÍÍͼ º ³ VB2::vb2 ³ ^ º D::B ÄÄÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÍÍÍÍÍÍÍÍÍÍÍ<ÍÍÍÍͺÍÍ<Íͼ ³ VB2 sub-obj ptr ³ ÍÍÍÍÍÍÍÍÍ>ÍÍÍÍÍÍÍͼ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ B::b ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ The virtual base VB2 is reachable from D through only one level of virtual inheritance due to the base class C; therefore, VB2 isn't added by the compiler as a virtual base of D. The diagram shows the VB1 base pointer (see *****), which is added by the compiler. The following example shows a hidden displacement member: class B { Chapter 1 Page 8 B(); virtual void f(); int b; }; class X:virtual B { int x; }; class Y:X { Y(); virtual void f(); int y; }; class Z:Y { int z; }; An instance of Y has the following layout: ÚÄÄYÄÄ> ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ^ ³ B sub-obj ptr ÃÄÄÄÄ>ÄÄÄ¿ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ X::x ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ X/Y vtable ptr ³ v ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ Y::y ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ v ³ ³ ³ ÀÄÄBÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÄÄÄÄÄÄÙ ³ B::b ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ B vtable ptr ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ An instance of Z has the following layout: Chapter 1 Page 9 ÚÄÄYÄÄ> ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ^ ³ B sub-obj ptr ÃÄÄÄÄ>ÄÄÄ¿ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ X::x ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ X/Y vtable ptr ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ v ³ ³ Y::y ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ Z::z ³ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ v ³ ³ ³ ÀÄÄBÄÄ> ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ <ÄÄÄÄÄÄÙ ³ B::b ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ B vtable ptr ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ As shown by the diagrams, the displacement between the B sub-object and the base of Y differs by 2 bytes, depending on whether the object is of type Y or Z. The displacement member will be set to -2 in the constructor Z::Z before X::X is called. The displacement member is reset to zero after the call to X::X has been completed. The virtual table thunk for the Y::f entry in the B part of X/Y's vtable will adjust the value of this by the current value of the displacement member, which is zero unless the current object is being constructed or destructed. Empty classes A class without any nonstatic data members is allocated 1 or 2 bytes (1, 2, or 4 bytes with the 32-bit compiler), depending on alignment options selected. Exception: if the class has virtual functions, the instance simply consists of the vtable pointer, and no padding is added. Addressing of class instances and this The address of a class instance is always the first byte allocated. For derived classes, the address is typically the first member of the 'root' class (the base-most base class). The size of the this pointer defaults to the default pointer size for the memory model in effect. Declaring the class itself as near or far overrides this default. A derived class inherits the size of this from the first base, and all of the following bases (if any) must use the same this size. Virtual table pointers When a vtable pointer is introduced in a class, it's inserted before any user- defined members of that class and after any base class sub-objects; a compiler option forces the vtable pointer member to be added after all user-defined Chapter 1 Page 10 members of the class, allowing many C++ structures with virtual function members to be easily shared with other languages, such as C. In the huge memory model, the vtable pointer is always far, while in all other memory models, the vtable pointer defaults to near. Declaring a class as huge or _export has the following consequences: overrides the default, making the vtable pointer far allocates the vtable either in the code segment or in a data segment specified by compiler options A vtable pointer in a derived class is shared with the first base if the base is nonvirtual and if it already contains a vtable pointer. Virtual tables A virtual table is a table of function pointers; near and far pointers can be arbitrarily mixed. No padding is added to align the far pointers. Virtual function calls, virtual thunks When a virtual function is called using the virtual mechanism, the value passed for this always points to the appropriate sub-object by the time execution arrives at the virtual function. When multiple inheritance is involved, any virtual functions inherited from a virtual base (or from a base that isn't the first base) and overridden in the derived class are dispatched through a virtual thunk. The pointer in the virtual table points to this thunk, which then adjusts the this value on the stack and jumps to the function body itself. A virtual thunk in a virtual base's vtable adds the value of the hidden 16-bit (32-bit for the 32-bit compiler) displacement member (described previously) to the this value under the following condition: a virtual function overrides a virtual in a virtual base class containing one or more user-defined constructors or a user-defined destructor. This technique ensures passage of the correct this value to the function regardless of the relative distance between the derived class and the virtual base. The relative distance might be different than the distance between the derived class and the 'pure' derived instance. Calling conventions for member functions The default calling convention for member functions is cdecl with user arguments pushed right-to-left, followed by the this pointer; the caller pops the arguments from the stack. With the pascal calling convention, any user arguments are pushed first from left to right, this is again pushed last, and the callee pops the arguments from the stack. A compiler option is available that passes this as the first argument to pascal member functions (for compatibility with other compilers and previous versions of Turbo C++ and Borland C++). Chapter 1 Page 11 Pointers to class members There are three general categories of member pointers: single inheritance (SI), multiple inheritance without virtual bases (MI), and the most general (VB). See the Borland C++ User's Guide for more information on how the compiler (or the user) chooses the effective representation for a specific member pointer type. Sometimes when a VB member pointer, which is capable of pointing to members of virtual bases, is cast to another member pointer type, the cast can't be carried out using inline code, and the compiler generates a call to an RTL helper function, which is discussed in more detail later. Note that for all categories of member pointers, a NULL pointer value has all the fields of the member pointer equal to zero. When testing a member pointer value for NULL, the compiler might test only some fields of the member pointer. Pointers to data members The following internal representation of pointers to data members describe two categories of pointers: unrestricted pointers, which can point to any member of any class, and pointers that can't point to members of virtual base classes: ______________________ SI/MI data member pointer _______________ size_t member_offset; ________ The SI/MI data member pointer is an offset within the class instance of the member being pointed to with one added to it, allowing zero to be used as a NULL pointer. ______________________ VB data member pointer _______________ size_t member_offset; size_t vbcptr_offset; ________ The VB data member pointer consists of two offsets; if vbcptr_offset is nonzero, the pointer points to a virtual base class member, and vbcptr_offset gives the offset of the virtual base class pointer within the object plus one. member_offset then specifies the offset within that virtual base class to the member being pointed to. When vbcptr_offset is zero, the pointer is treated just like the "SI/MI" data member pointer. Chapter 1 Page 12 Pointers to function members Pointers to member functions resemble pointers to data members, except they always contain a function pointer. If the function is nonvirtual, the function pointer either points to the member function or to a virtual call thunk that uses the virtual mechanism to transfer control to an appropriate virtual function. The compiler creates such thunks automatically. When calling through a pointer to a member function, the appropriate value must be passed to the function for the this parameter. When calling using the ->* operator, the value equals the object pointer, while when calling using the .* operator, the value equals the address of the object. The address must be adjusted based on additional fields in the member pointer: ______________________ SI function member pointer _______________ void (*func_addr)(); The SI function member pointer contains the address of the member function. The function pointer is appropriately typed, based on the member pointer type. ______________________ MI function member pointer _______________ void (*func_addr)(); size_t member_offset; ________ The MI function member pointer adjusts the this value passed to the member function by member_offset - 1. ______________________ VB function member pointer _______________ void (*func_addr)(); size_t member_offset; size_t vbcptr_offset; ________ The VB function member pointer adjusts the this value passed to the member function with an algorithm similar to the one that adjusts offsets for VB data member pointers. Chapter 1 Page 13 Static data members Static data members default to near in all memory models except the huge model; however, static data members of classes declared _export always default to far in all memory models. __export/__import classes Declaring a class __export causes all of its noninline member functions and static data members to be exported, it also makes the vtable pointer far, and allocates the virtual table for the class in the code segment; moreover, declaring a class __export or __import causes all of the static data members and member functions of the class to default to far. Passing classes by value When a function accepts a class with constructors argument, the actual argument value is copy-constructed onto its place on the stack, and the called routine calls the destructor for the argument if its class has a destructor. A compiler option causes the compiler to convert class with constructors arguments to reference to class arguments, and the compiler creates temporary storage at the calling site to hold the argument value (for compatibility with older versions of Turbo C++ and Borland C++). Initialization and finalization of nonlocal static objects The compiler initializes and finalizes nonlocal static objects in each compilation as required. The functions included for initialization and finalization are registered through the standard Turbo C++ and Borland C++ #pragma startup/exit mechanism. Conventions for constructors and destructors When the compiler passes a hidden parameter in addition to this, for example, when calling a constructor, the parameter is passed as if it were to the right of this and to the left of the first user argument if such an argument exists. Constructors The compiler passes a constructor the address of object memory to be constructed, or it passes a zero for this, in which case the constructor allocates the memory for the object through the operator new. If the allocation fails, the constructor immediately returns zero; in all other cases, the constructor returns the address of the object constructed. The compiler gives constructors for classes with any virtual bases (direct or indirect) an extra int parameter to indicate the following action: A zero means the constructor should construct all virtual base classes. (The class is known to be the most-derived class, and the location of all virtual bases within the object is known at compile time.) Chapter 1 Page 14 A nonzero means virtual bases have already been constructed by a derived class constructor. Destructors A destructor tests this for NULL before taking other action on an object. If this is NULL, the destructor immediately returns. All destructors are passed an extra int parameter that contains two bit flags: 0x01 When this bit is on, the destructor calls operator delete to deallocate the memory taken up by the object, then the destructor returns. 0x02 When this bit is on, all virtual bases are destroyed. This bit is only used for classes with virtual bases. RTL helper functions The run-time library supplies several helper functions to the compiler for allocating, deleting, and copying certain arrays of classes. The following functions, _vector_apply_ and _vector_applyv_, have "C" linkage. extern "C" void _vector_apply_ ( void far * dest, // address of destination array void far * src, // address of source array size_t size, // size of each object unsigned count, // number of objects unsigned mode, // type of function to call ... // operator=/copy-constructor address here ) extern "C" void _vector_applyv_ ( void far * dest, void far * src, size_t size, unsigned count, unsigned mode, ... ) _vector_apply_ and _vector_vapply_ assign or copy-construct class elements of the type array of class type. Since the operator= or the copy-constructor might be a near or far function, and take a near or far this value, mode is passed to determine how to cast this. A near pointer must be passed for near functions and a far pointer for far functions, and it's impossible to determine the argument type until runtime; consequently, varargs is used to resolve the problem. The compiler guarantees that source and destination are both near or both far. Chapter 1 Page 15 The version with the v suffix passes a second argument of zero for copy- constructors of classes with virtual bases. The following list shows the interpretation of the mode for _vector_apply_ and _vector_vapply_: far function 0x01 pascal call 0x02 far pointer 0x04 The following functions, _vector_new_ and _vector_vnew_, which return near pointers to void, have C++ linkage. They are used only in the tiny, small, and medium memory models: void near * _vector_new_ ( void near * ptr, // address of array (0 means allocate) size_t size, // size of each object unsigned count, // how many objects unsigned mode, // mode bits (see below) ... // constructor address passed here ); void near * _vector_vnew_ ( void near * ptr, size_t size, unsigned count, unsigned mode, ... ); The following functions, which return far pointers to void, exist in all memory models: void far * _vector_new_ ( void far * ptr, size_t size, unsigned long count, unsigned mode, ... ); void far * _vector_vnew_ ( void far * ptr, size_t size, unsigned long count, unsigned mode, ... ); Chapter 1 Page 16 The following list shows the interpretation of the mode for _vector_new and _vector_vnew: far function 0x01 pascal call 0x02 far pointer 0x04 store element count 0x10 huge array (array > 64K) 0x40 The _vector_new_ and _vector_vnew_ routines construct arrays of class type. If ptr is NULL, the routines allocate the space for the array. If mode has 0x10 set, allocated space includes a count field stored at the beginning. If mode has 0x40 set, the pointer returned must be adjusted to prevent a class from crossing the 64K boundary, and the address passed back is adjusted accordingly. Since the constructor for the class might be a near or a far function, and take a near or far this value, mode is passed to allow correct casting. A near pointer must be passed for near functions and a far pointer for far functions, and it's impossible to determine the argument type until runtime; consequently, varargs is used to resolve the problem. The far versions of _vector_new_ and _vector_vnew_ are used in the small data memory models for arrays of far classes, regardless of whether or not they're huge. The far and near versions of _vector_vnew pass a second argument, zero, to the constructor. These versions are used for classes with virtual bases. The following version of function _vector_delete_ is used only in the tiny, small, and medium memory models: void _vector_delete_ ( void near * ptr, // address of array size_t size, // size of each object unsigned count, // how many objects unsigned mode, // how to call ... // destructor address passed here ) The following version of function _vector_delete_ exists in all memory models: void _vector_delete_ ( void far * ptr, size_t size, unsigned long count, unsigned mode, ... ) The following list shows the interpretation of the mode for _vector_delete_: Chapter 1 Page 17 far function 0x01 pascal call 0x02 far pointer 0x04 deallocate 0x08 stored element count 0x10 huge array (array > 64K) 0x40 The _vector_delete_ routines destroy arrays of class type. If mode has 0x08 set, the routines deallocate the space for the array after destroying the elements. When mode has 0x18 set, causing deallocation to occur and count to be used, the count is retrieved from the count field stored in a 16-bit word just below the array. Since the destructor for the class might be a near or a far function, and take a near or far this value, mode is passed to allow correct casting. A near pointer must be passed for near functions and a far pointer for far functions, and it's impossible to determine the argument type until runtime; consequently, varargs is used to resolve the problem. The far version of _vector_delete is used in the small data memory models for arrays of far classes, regardless of whether or not they're huge. Name mangling There are four basic forms of encoded names in Borland C++: 1. @className@functionName$args This encoding denotes a member function functionName belonging to class className and having arguments args. Class names are encoded directly. The following example shows a className in an encoded name: @className@... The class name may be followed by a single digit; the digit value contains the following bits (these can be combined): 0x01 the class uses a far vtable 0x02 the class uses the -po calling convention 0x04 the class has an RTTI-compatible virtual table; this bit is only used when encoding the name of the virtual table for the class The digit is encoded as an ASCII representation of the bit mask value, with 1 subtracted (so that, for example, the class prefix for a class 'foo' that uses far vtables would be '@foo@0'). See the next section on the encoding of function names and argument types. 2. @functionName$args This form of encoding denotes a function functionName with arguments args. 3. @className@dataMember This form of encoding denotes a static data member dataMember belonging to class className. Names of classes and data members are encoded directly. The following example shows a member myMember in class myClass: @myClass@myMember 4. @className@ Chapter 1 Page 18 This name denotes a virtual table for a class className. As mentioned previously, class names are encoded directly. Encoding of nested and template classes The following form encodes a name of a class lexically nested within another class: @outer@inner@... A template instance class encodes the name of the template class, along with the actual template arguments, in the following way: %templateName$arg1$arg2 ..... $argn% Each actual argument starts with a letter, specifying the kind of argument it is: t type argument i nontype integral argument g nontype nonmember pointer argument m nontype member pointer argument The first letter is followed by the encoded type of the argument. For a type argument, this code also represents the argument's actual value. For other kinds of arguments, the type code is followed by $ and the argument value, encoded as an ASCII number or symbol name. An instance of template whose name is vector is encoded as shown in the following example: %vector$tl$ii$100% Encoding of function names The encoded functionName might denote either a function name, a function such as a function such as a constructor or destructor, an overloaded operator, or a type conversion. Ordinary functions Ordinary function names are encoded directly, as shown in the following examples: foo(int) --> @foo$qi sna::foo(void) --> @sna@foo$qv The string $qi denotes the integer argument of function foo; '$qv' denotes no arguments in sna::foo. Chapter 1 Page 19 Constructors, destructors, and overloaded operators_____________________________________________________________ The following information covers argument encoding in more detail. Constructors, destructors, and overloaded operators are encoded with a $b character sequence, followed by a character sequence from the following table: Character Meaning Sequence _____________________________________________________________________ ctr constructor dtr destructor add + adr & and & arow -> arwm ->* asg = call () cmp ~ coma , dec -- dele delete div / eql == geq >= gtr > inc ++ ind * land && lor || leq <= lsh << lss < mod % mul * neq != new new not ! or | rand &= rdiv /= rlsh <<= rmin -= rmod %= rmul *= ror |= rplu += rrsh >>= rsh >> rxor ^= sub - Chapter 1 Page 20 subs [] xor ^ nwa new [] dla delete [] ___________________________________________________________ The following examples show how arguments are encoded with character sequences, add, ctr, and dtr from the previous table: operator+(int) --> @$badd$qi plot::plot() --> @plot@$bctr$qv plot::~plot() --> @plot@$bdtr$qv The string $qv denotes no arguments in the plot constructor or destructor. Type conversions Encoding of type conversions is accomplished with the $o character sequence, followed by the distinguishing return type of the conversion as part of the function name. The return type follows the rules for argument encoding, explained later. The lack of arguments in a conversion is made explicit in the mangling by adding $qv to the end of the encoded string. Example: foo::operator int() --> @foo@$oi$qv foo::operator char *() --> @foo@$opzc$qv The i following $o in the first example denotes int; the pzc in the second example denotes a near pointer to an unsigned char. Encoding of arguments The number and conbinations of function arguments make argument encoding the most complex aspect of name mangling. Argument lists for functions begin with the characters $q. Type qualifiers are then encoded as shown in the following table: ________________________________________________________________________________ Character Meaning Sequence ______________________________________________________________________ up huge ur _seg u unsigned z signed x const w volatile __________________________________________________________ Encoding of built-in types follows that for applicable type qualifiers, in accordance with the following table: ________________________________________________________________________________ Character Meaning Sequence ______________________________________________________________________ Chapter 1 Page 21 v void c char s short i int l long f float d double g long double e ... _______________________________________________________________ Encoding of non-built-in types follows that for applicable type qualifiers, in accordance with the following table: ________________________________________________________________________________ Character Meaning Sequence ______________________________________________________________________ (an enumeration or class name) p near * r near & m far & n far * a array M member pointer (followed by class and base type) __________________ The appearance of one or more digits indicates that an enumeration or class name follows; the value of the digit(s) denotes the length of the name, as shown in the following examples: foo::myfunc(myClass near&) is mangled as @foo@myfunc$qr7myClass foo::myfunc(anotherClass near&) is mangled as @foo@myfunc$qr12anotherClass A character x or w may appear after p, r, m, or n to denote a constant or volatile type qualifier, respectively. The character q appearing after one of these characters denotes a function with arguments the follow in the encoded name, up to the appearance of a $ character, and finally a return type is encoded. The following example show how these encoding rules are applied: @foo@myfunc$qpxzc is mangled as foo::myfunc(const char near*) @func1$qxi is mangled as func1(const int) @foo@myfunc$qpqii$i is mangled as foo:myfunc(int (near*)(int,int)) Array types are encoded as a, followed by a dimension encoded as an ASCII decimal number and a $, and finally the element type, as shown in the following example. foo( int (*x)[20] ) is mangled as @foo$qpa20$i Encoded arguments are concatenated in the order of appearance in the function call. The character t followed by an ASCII character encodes the arguments when a number of identical nonbuiltin types are function arguments. The ASCII Chapter 1 Page 22 character, ranging from ASCII 31H - 39H and 61H - 7FH (1 to 9 and a onward), denotes which argument type to duplicate, as shown in the following example: @plot@func1$qdddiiilllpzctata is unmangled to plot::func1(double, double, double, int, int, int, long, long, long, char near*, char near*, char near*) The two duplicate ta character sequences at the end of the encoded name denote the tenth argument, encoded as pzc. Dynamically dispatchable virtual tables The DDVT table always precedes the 'regular' virtual table for the given class. The DDVT is located at negative offsets from the virtual table pointer. The following layout shows the format of the DDVT: void (far *fpt[count])(); unsigned idt[count]; unsigned count; void *basep; the regular virtual table starts here: void (*vtab[])(); The fpt and idt tables contain the addresses and IDs, respectively, of all DDVT functions introduced or overridden in the class. The count holds the number of entries in the tables. basep holds the address of the virtual table for the base class or zero if the class has no base; the size of the base class pointer is the same as the virtual table pointer for the class. The pointer is a far pointer for huge classes. For example, consider the following two classes: struct base { virtual f() = [11]; virtual g() = [22]; virtual h(); }; struct der:base { f(); virtual i() = [33]; h(); }; The following table is the DDVT/virtual table for class base: dd @base@f$qv ; addr of foo::f() dd @base@g$qv ; addr of foo::g() Chapter 1 Page 23 dw 11 ; ID for f() dw 22 ; ID for g() dw 2 ; 2 entries in DDVT dw 0 ; no base class base_vtable: dd @base@h$qv ; addr of base::h() The following table is the DDVT/virtual table for class der: dd @der@f$qv ; addr of der::f() dd @der@i$qv ; addr of der::i() dw 11 ; ID for f() dw 33 ; ID for i() dw 2 ; 2 entries in DDVT dw base_vtable ; base class vtable addr der_vtable: dd @der@h$qv ; addr of der::h() Chapter 1 Page 24 CHAPTER _________________________________________________ 2 Object file contents This chapter covers the comment records sent to the object file by Borland C++ version 4.0. Other Borland compilers may not emit all of the records described here. The comment records are actually Intel Object Module Format (OMF-86) Comment records with the following specifications: _________________________________________________ Value or Length Description ____________________________ 0x88 COMMENT record byte 2 bytes record length 0x00 A control byte (always zero) 1 byte Comment record class (see below) n bytes Data (depends on Comment record class) 1 byte Checksum _______________________________ For fields described in this document, strings are stored as Pascal-style strings with a leading length byte, which might be zero. A zero length byte indicates a null string. An index is an OMF-86 index field. That is, if the value is below 128, then the index is a byte field with the index value; otherwise, the field is two bytes. The first byte has the high bit set and the remaining bits are the seven high-order bits of the index. The second byte is the low-order 8 bits of the index. Type indices in the are the type indices defined for the .EXE file tables. Immediate indices 0 to 23 refer to scalar types. Type index 0 indicates an unknown type. Any type index higher than 23 indicates the index of a type record defined in the current file. Each type record contains its Chapter 2 Page 25 own index, since the output of type records isn't necessarily in index order. The official Intel-type index fields are always zero, because MS-Link uses them for special purposes. The order of comment records inside the object file is fairly flexible. Unless the description of a comment record specifies ordering requirements, the comment record might appear anywhere between the module header and module end records. The must appear immediately after the module header record and before any other type records. The compiler identification need not be for Turbo C++, nor is it absolutely necessary that a compiler id record appear at all. Turbo object file comment records This section dissects each comment record class. The memory location of the record class and its name appear on the left-hand side of the page, and a description of the record is located on the right-hand side of the page. 0x00 Compiler identification string string A descriptive name reflecting the name of the translator used to generate this object file. For instance "Turbo Assembler Version 2.0". 0xe0 External symbol type index index The type index of an external symbol. External symbols must be placed one symbol per EXTDEF record. This comment record supplies the type index of the external symbol located just previous to it in the object file. If the debug information version record 0xf9 appears in the object file, the following fields are represented: index The index of the source file that caused the record to be emitted. Chapter 2 Page 26 word The line number of the instruction in the source code line that caused the record to be emitted. The word is present only if the previous index is nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xe1 Public symbol type index index The type index of a public symbol. Public symbols must be placed one symbol per PUBDEF record. This comment record supplies the same type index as the public symbol located just previous to it in object file. byte If the symbol is a function with a valid BP, the byte contains the third bit set to one (hex 0x8), and the upper four bits set to the number of words between the BP value and the return address. If the debug information version record (0xf9) appears in the object file, the following fields are present: index Index of source file that caused this record to be emitted. word The line number of the instruction in the source code line that caused the record to be emitted.The word is present only if the previous index equals nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xe2 Structure member definition Typically, all the members of a single structure are written to a single member record. If the number of members is so great that the OMF record exceeds 8K, or if the OMF record exceeds 8K for some other reason, the members of a single structure might be spanned across multiple records. Only the last member of the structure has the terminating bit (the high bit of the first byte) set. No more than one structure can appear in a member definition record. If the debug version record 0xf9 doesn't appear in the object file, then one or more member definition records for a structure are written immediately before the type record for that structure; otherwise, the structure member Chapter 2 Page 27 definition records must appear after the type for the structure, and after all the types that the member definition records reference. * A consecutive set of member definition records. Each record consists of the following information: 1st byte: * 0x60 Static member * 0x50 Conversion * 0x48 Member function, which might be combined with the following bits: * 0x01 destructor * 0x02 constructor * 0x03 static member function * 0x04 virtual member function If none of the previous values are present, then the following interpretation of the byte applies: low six bits If the member is a bit field, this field represents the number of bits in the field; otherwise, the field is set to zero. seventh bit This bit is set to zero if next bit is a normal member or to one if the next bit is a New Offset record. high bit This bit is set to zero if there are more members in the current structure or to one if this is the last member in the structure. For normal members the following rule applies: string The member name. A zero byte is used for unnamed members. Since no explicit offset for each member is given, offsets are computed by counting the length of each member. When holes exist from bit fields not filling a byte or Chapter 2 Page 28 when word alignment is used, an unnamed member is emitted. Such a member is always a bit field member with the appropriate number of pad bits. Although the compiler currently behaves according to this description, it accepts nonbit field unnamed members. index The member type. For conversions, this index specifies the target type of the conversion, for example int for "operator int();". For New Offset members the following information applies: double word The new byte offset of the records that follow it. The double word allows variant records, since each variant portion can be started with a New Offset member. As a double word, this field is suitable for large structures. 0xe3 Type definition One type is defined in each type definition record. The format of the type record depends on the type identification (TID) byte. See TID values defined in the EXE debug table format beginning on page 51. TLINK defines a set of universal scalar types to save space in the object files. For integer range types, the type is stored with the maximum range for each type. If an index of less than twenty- four (decimal) appears in the object file, one of the pre-assigned types is indicated, and no type definition appears in the object file. The following list shows the set of and their assigned indices: _________________________________________________ Index Type ___________________________________ 1 void 2 signed char 4 signed short int 6 signed long int 8 unsigned char 10 unsigned short int 12 unsigned long int 14 float 15 double 16 long double 17 Pascal 6-byte real Chapter 2 Page 29 18 Pascal boolean 19 Pascal character type 21 8-byte signed range 22 8-byte unsigned range 23 10-byte value (tbyte) __________________ index The index of the type being defined. All types must have a valid index of twenty-four (decimal) or greater, and the indices must be unique within the object file. There's no requirement to write types in any particular order. All of the type indices for a given file form a contiguous block beginning at twenty- four and proceeding to the highest numbered index. Since some types occupy eight bytes and others sixteen bytes in the .EXE file, the TID values requiring sixteen bytes reserve their own type index as well as the next higher type index. string The type name, if any exists. For C, the type name is used only for structure, union, and enum tags. For Pascal, any type might be named. word The size in bytes of the type. TID byte This is the TID of the type being defined. These following list shows the : _________________________________________________ Name Value Description _______________ TID_VOID 0x00 Unknown. TID_LSTR 0x01 Basic literal string. TID_DSTR 0x02 Basic dynamic string. TID_PSTR 0x03 Pascal style string. TID_SCHAR 0x04 1 byte signed integer range. TID_SINT 0x05 2 byte signed integer range. TID_SLONG 0x06 4 byte signed integer range. TID_SQUAD 0x07 8 byte signed integer. TID_UCHAR 0x08 1 byte unsigned integer range. TID_UINT 0x09 2 byte unsigned integer range. Chapter 2 Page 30 TID_ULONG 0x0A 4 byte unsigned integer range. TID_UQUAD 0x0B 8 byte unsigned integer. TID_PCHAR 0x0C Pascal character range (no arithmetic). TID_FLOAT 0x0D IEEE 32-bit real. TID_TPREAL 0x0E Turbo Pascal 6-byte real. TID_DOUBLE 0x0F IEEE 64-bit real. TID_LDOUBLE 0x10 IEEE 80-bit real. TID_BCD4 0x11 4 byte BCD. TID_BCD8 0x12 8 byte BCD. TID_BCD10 0x13 10 byte BCD. TID_BCDCOB 0x14 COBOL BCD. TID_NEAR 0x15 Near pointer. TID_FAR 0x16 Far pointer. TID_SEG 0x17 Segment pointer. TID_NEAR386 0x18 386 32-bit offset pointer. TID_FAR386 0x19 386 48-bit far pointer. TID_CARRAY 0x1A C array - 0 based. TID_VLARRAY 0x1B Very Large 0 based array. TID_PARRAY 0x1C Pascal array. TID_ADESC 0x1D Basic array descriptor. TID_STRUCT 0x1E Structure. TID_UNION 0x1F Union. TID_VLSTRUCT 0x20 Very Large Structure. TID_VLUNION 0x21 Very Large Union. TID_ENUM 0x22 Enumerated range. TID_FUNCTION 0x23 Function or procedure. TID_LABEL 0x24 Goto label. TID_SET 0x25 Pascal set. TID_TFILE 0x26 Pascal text file. TID_BFILE 0x27 Pascal binary file. TID_BOOL 0x28 Pascal boolean. TID_PENUM 0x29 Pascal enumerated range (no arithmetic). TID_PWORD 0x2A Pword TID_TBYTE 0x2B Tbyte TID_SPECIALFUNC 0x2D Member/Duplicate function TID_CLASS 0x2E C++ Class TID_HANDLEPTR 0x30 Handle based ptr TID_MEMBERPTR 0x33 Type pointed to by a class member pointer. TID_NREF 0x34 Near reference TID_FREF 0x35 Far reference TID_NEWMEMPTR 0x38 New stype member ptr ______ The format of the remainder of the type record depends on the TID byte as shown in the following table: Chapter 2 Page 31 Simple types TID_VOID TID_FLOAT TID_BCD8 TID_TFILE TID_LSTR TID_TPREAL TID_BCD10 TID_BOOL TID_DSTR TID_DOUBLE TID_ADESC TID_SCHAR TID_SQUAD TID_LDOUBLE TID_STRUCT TID_PWORD TID_UQUAD TID_BCD4 TID_UNION TID_TBYTE Pascal string type TID_PSTR byte The maximum size of the string. Labels TID_LABEL byte Zero if near, one if far. Integral range types TID_SCHAR TID_SLONG TID_UINT TID_PCHAR TID_SINT TID_UCHAR TID_ULONG The integral range types are a hierarchy of related types that form a tree. The root of the tree is the general type, which is stored explicitly as a range. The parent type is zero and the lower and upper bounds are the entire range of values storable in the size of memory indicated by the TID. The bound values are interpreted as signed or unsigned according to the TID. The Pascal character TID (TID_PCHAR) is stored as an unsigned character-sized range, except arithmetic isn't allowed on objects of Pascal character type. For all types, a 4-byte upper and lower bound value allows standard treatment of range checking. The sub-fields are stored as shown in the following list: * index The parent type index * double word The lower bound of the range Chapter 2 Page 32 * double word The upper bound of the range Cobol-style BCD TID_BCDCOB byte The position of the decimal point. The number of total digits is determined from the size, using 2 digits per byte, except for the last byte, which has one digit and a sign. The decimal position is the number of digits to the right of the decimal point. Pointer types TID_NEAR TID_SEG TID_FAR386 TID_FREF TID_FAR TID_NEAR386 TID_NREF All pointer types have an index field for the pointed-to type. All have an additional byte field following the pointed-to type field that consists of extra information as follows: TID_NEAR and TID_NEAR386 The segment base of the pointer: _________________________________________________ Value Segment register _________________________ 0x0 segment register unspecified. 0x1 ES relative 0x2 CS relative 0x3 SS relative 0x4 DS relative 0x5 FS relative 0x6 GS relative ______________________________ TID_FAR and TID_FAR386 0x0 far pointer arithmetic (no segment adjustments). 0x1 huge pointer arithmetic (segment adjustments to avoid offset wrap- around). Chapter 2 Page 33 TID_SEG 0x0 ignored TID_NREF 0x0 ignored TID_FREF 0x0 ignored Array types TID_CARRAY index The index of the element type. The dimension of the array is determined by dividing the size of the overall array by the size of each element. No padding is assumed between array elements. TID_VLARRAY word The upper 16 bits of the array size. This word is placed so that the normal type size field and this one can be considered a double word size. index The index of the element type. The dimension of the array is determined as it is for normal C arrays. TID_PARRAY index The element type. index The type of the dimension. The number of elements in the array and their indices are determined by the dimension type, which is normally some sort of integral or enum range. Chapter 2 Page 34 Very large structure types TID_VLSTRUCT and TID_VLUNION word The upper 16 bits of the size of the struct or union. This word is placed so that the upper 16 bits and the normal type size can be considered a double word size. Enumerated types TID_ENUM and TID_PENUM index The index of the parent type. word The lower bound of the range (considered a signed integer range). word The upper bound of the range (considered a signed integer range). If the debug information version record (0xf9) has appeared in the object file, then the following field is present: index Index to the first member of the enum. That is, this is an index to the structure member definition record that defines members to the enum. Function types TID_FUNCTION index The type index of the type returned. byte The language modifier byte: 0x0 Near C function 0x1 Near Pascal function 0x2 Unused. 0x3 Unused. 0x4 Far C function 0x5 Far Pascal function 0x6 Unused. 0x7 Interrupt function. Chapter 2 Page 35 byte This byte is set to one if the function accepts a variable number of arguments; otherwise, it is zero. Sets TID_SET index The parent type. Binary files TID_BFILE index The element type. Member/duplicate functions TID_SPECIALFUNC index The type index of the return type. byte Language modifier byte (same as regular functions). byte Bit 0 is set to indicate a member function; bit 1 is set to indicate a duplicate function; bit 2 set to indicate an operator function; bit 3 set to indicate internal linkage; bit 4 set to indicate this is a Pascal function passing 'this' as last parameter. index The type index of the class if the function is a member function. index Word offset in the virtual table if the function is a member function. name if the function is a nonlocal member function. this should appear as a local symbol in the second inner scope of a member function, not in the outermost (parameter) scope. Chapter 2 Page 36 C++ Class TID_CLASS index The class index for this class. Pointed-to members TID_MEMBERPTR index The type index of the pointed-to type. index The class index of the class whose members are pointed to. New style pointed-to members TID_NEWMEMBERPTR byte Member pointer flags. index The type index of the pointed-to type. index The class index of the class whose members are pointed to. 0xe4 Enum member definitiona Typically, all members of a single enum are written to a single member record. If the number of members is so great that the OMF record exceeds 8K, or if the OMF record exceeds 8K for some other reason, the members of a single enum might be spanned across multiple records. Only the last member of the enum has the terminating bit (high bit of first byte) set. No more than one enum can appear in a member definition record. If the debug version record 0xf9 has not appeared in the object file, then one or more member definition records for an enum are written immediately before the type record for that enum; otherwise, the enum member definition record must appear after the type for the enum. Chapter 2 Page 37 Each record in a consecutive set of member definition records consists of the following data: byte 0x80 for the last member of the enum, otherwise this byte is set to zero. string The member name. word The member value. 0xe5 Begin scope record Scopes are defined by a pair of begin-scope end- scope records. The relationships of nested scopes are specified by enclosing the begin/end records of one scope between the begin/end records of another. Local symbols are defined for a scope by having the locals definition records between the begin/end records of the scopes. index The segment index of the segment containing the scope. This segment must be the same as the segment of the starting address. word The offset, relative to the code segment, of the start of this scope. 0xe6 Locals definition record This record consists of a set of symbol definitions,all local to the innermost enclosing scope. The following list shows the contents for each symbol: string The symbol name. index The symbol type index. byte The symbol class byte. The remainder of the symbol depends on the value of the symbol class byte: SC_TYPEDEF (6) and SC_TAG (7) If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. Chapter 2 Page 38 SC_STATIC (0) index The group index of the symbol. * index The segment index of the segment containing the symbol. For an absolute symbol this must be an absolute segment. * word The offset relative to the given segment of the symbol. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_ABSOLUTE (1) index The segment index of the segment containing the symbol. For an absolute symbol the index must be an absolute segment. * word The offset relative to the given segment of the symbol. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_AUTO (2) and SC_PASVAR (3) word The signed offset, relative to BP, of the symbol. For Pascal variable parameter symbols, the location contains the address of the symbol. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_REGISTER (4) byte A register id. Register ids map to registers as follows: 0x00 AX 0x01 CX 0x02 DX 0x03 BX 0x04 SP 0x05 BP 0x06 SI 0x07 DI 0x08 AL 0x09 CL 0x0A DL 0x0B BL 0x0C AH 0x0D CH 0x0E DH 0x0F BH 0x10 ES 0x11 CS 0x12 SS 0x13 DS 0x14 FS 0x15 GS 0x18 EAX 0x19 ECX 0x1A EDX 0x1B EBX 0x1C ESP 0x1D EBP 0x1E ESI 0x1F EDI If the register ID value is greater than 0x28, the field then specifies an offset (minus 0x28) into the optimized symbols table which is the live range information for this variable. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. Chapter 2 Page 39 SC_CONST (5) dword The 32-bit constant value. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_OPT (8) index The number of entries for this local. Each entry represents a different location for the local for a different set of code offsets; hence, a single SC_OPT sub-record represents a complete list of optimized symbol records for the debugger. The following section describes the format of the entries: * word Starting offset of the live range of the variable. The offset is relative to the offset of the outermost enclosing scope. * word Ending offset of the live range of the variable. The offset is relative to the offset of the outermost enclosing scope. * byte One of SC_AUTO, SC_PASVAR or SC_REGISTER. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_AUTO and SC_PASVAR * word The signed offset, relative BP, of the symbol. For Pascal variable parameter symbols, the location contains the address of the symbol. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. SC_REGISTER * byte The register id. SC_OPT is complex to be able to handle the difficulties encountered when a variable lives in a register, is spilled to the stack, and then is moved to a register again. This complexity does not exist in Borland C++ Version 4.0, because split live ranges are not implemented; however this specification was written with the intent of covering all contingencies, such as the compiler getting smarter with live ranges. Chapter 2 Page 40 If the debug information version record 0xf9 appears in the object file, the following fields are present: * index Index of source file that defined this record to be emitted. * word Line number of source code that caused this record to be emitted. This word is present only if the previous index is nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xe7 End of scope word The offset relative to the code segment of the end of the scope. 0xe8 Select source file This comment is placed before any line numbers for a particular file. It's not needed if line numbers aren't generated before the next source file is encountered. index The source-file index of the new source file. If no further data exists in this record, then this index refers to an existing source file specified in a Select Source File record; otherwise, it is followed by the source file name and time stamp. string The source file name, relative to the current path. dword The DOS date and time stamp for the file. 0xe9 Dependency file definition This comment is included for each distinct source and include file in the object module. The records should be placed near the top of the object file, since a MAKE utility must scan the file for dependency records. The first dependency record must precede any noncomment record other than the THEADR record. dword The DOS date and time stamp for the file. string The name of the source file. The string opens the file. For Turbo C, if an found in a -I directory, the directory name is prepended to the filename, Chapter 2 Page 41 allowing the MAKE utility to check dependencies by simply retrieving the file time stamp without searching through a path. If the record has zero length, then there are no more dependency records in the object file. 0xea Compile parameters record 1st byte The source language for this object file. If an assembler source contains debugging information, the language is the one specified in the source, not assembly language. The following language types are defined: 0 - unspecified 1 - C 2 - Pascal 3 - Basic 4 - Assembly 5 - C++ 2nd byte 1 bit This bit is one if underbars were prepended to C language source symbols, otherwise, it's zero. 3 bits These bits specify the and, therefore, the default pointer sizes for this source: 0 - Tiny 1 - Small 2 - Medium 3 - Compact 4 - Large 5 - Huge 6 - 80386 Small 7 - 80386 Medium 8 - 80386 Compact 9 - 80386 Large Code pointers are near in the Tiny, Small, and Compact models, and far otherwise. Data pointers are near in the Tiny, Small and Medium Models, and far otherwise. The 80386 models are analogous to the corresponding 8086 models: A near has a 32-bit offset, and a far 80386 pointer is a 48-bit pointer. Chapter 2 Page 42 0xeb External symbol matched type index The following fields are repeated as many times as necessary to fit in the record. * string The symbol name itself. * index The type index of the symbol. If the debug information version record 0xf9 appears in the object file, the following fields are present: * index Index of the source file that caused this record to be emitted. * word Line number of source code line that caused the record to be emitted. This word is present only if the previous index is nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xec Public symbol matched type index The following fields are repeated as many times as necessary to fit in the record. * string The name of the public symbol. * index The type index for the symbol. * byte This byte contains the same information as the valid BP byte previously defined. If the debug information version record 0xf9 appears in the object file, the following fields are present: * index Index of source file that caused this record to be emitted. * word Line number of source code line that caused this record to be emitted. This word is Chapter 2 Page 43 present only if the previous index is nonzero. If the debug information version record 3.1 appears in the object file, the following fields are present: First is a 16-bit file index. If this is zero, there is no reference info. If this is non-zero, there follow a set of line numbers with special encodings. Each line number is stored as a delta from the previous line number, with the starting line number after a file index being 0. By default, the delta is stored in a byte, with the low 6 bits being the line number, and the 7th bit being a toggle to specify whether this is a reference or an assignment. If the 7th bit is set, it is an assignment. If the byte is greater then or equal to 0xf0, then it is a special encoding with the following meaning: 0xff Next byte/word is a file index, with an absolute word line number following. 0xfe The absolute line number is stored in the next word, and this is a reference. 0xfd The absolute line number is stored in the next word, and this is an assignment. The remaining values are reserved. A reference info object is terminated by 2 zero bytes. 0xed Class definition This record describes classes. The class definition records have the following format: * byte 0 = class description Class descriptions Class descriptions have the following format: * index Class index for the class. TID_CLASS and TID_MEMBERPTR type records refers to this index. * word Offset (in bytes) of the . If the debug information version record 0xf9 appears in the object file, the following field is present: * index Index to the first member of the structure; that is, an index to the structure member definition record that defines members to this class. byte Info bits: bit 0: Class declared as 'struct' bit 1: 'huge' class (far vtable pointer) bit 2: 'far' class (far 'this' pointer) bit 3: 'far' class that uses 'near' vbase pointers bit 4: a union * index The number of parent indices that follow. * word(s) Indices of parent classes (repeated); the highest bit is set for virtual base classes. Chapter 2 Page 44 Note If a class definition appears between begin-scope and end-scope records, it is interpreted as a locally defined class. 0xee Coverage offset record To aid in profiling, the compiler emits offsets to delimit the start and end of basic blocks. The offsets, if taken pairwise, define the beginning and end of a basic block. The offsets are relative to the specified logical segment defined in the object file. * index The segment index of the segment, corresponding to the offsets that follow. * array of words Each word corresponds to an offset. The length of the array is dictated by the length of the OMF record. 0xf5 Begin large scope record Scopes are defined by a pair of begin-scope, end-scope records. The relationships of nested scopes are specified by enclosing the begin/end records of one scope between the begin/end records of another. Local symbols are defined for a scope by having the locals definition records between the begin/end records of the scopes. * index The segment index of the segment containing the scope. This segment must be the same as the segment of the starting address. * double word The large offset, relative to the code segment, of the start of this scope. Chapter 2 Page 45 0xf6 Large offset locals definition record This record consists of a set of symbol definitions, all local to the innermost enclosing scope. The following list shows the contents for each symbol: * string The symbol name. * index The symbol type index. * byte The symbol class byte. The remainder of the symbol depends on the value of the symbol class byte: SC_STATIC (0) index The group index of the segment containing the symbol. * index The segment index of the segment containing the symbol. * double word The large offset relative to the given segment of the symbol. SC_ABSOLUTE (1) index The segment index of the segment containing the symbol. For an absolute symbol the segment must be absolute. * double word The large offset relative to the given segment of the symbol. Chapter 2 Page 46 SC_AUTO (2) and SC_PASVAR (3) double word The signed large offset, relative to BP, of the symbol. For Pascal variable parameter symbols, the location contains the address of the symbol. 0xf7 Large end of scope double word The large offset relative to the code segment of the end of the scope. 0xf8 Member function This record has to be located immediately after the outermost begincope record for every member function. It contains one field: * string Mangled name of member function. 0xf9 Debug Information Version This record immediately follows the compiler identification comment record. It specifies the major and minor version numbers of the debug information present in this file. If the major version of the debug information is higher than the major version that the linker understands, then all debug information is ignored. The minor version is ignored by the linker and is only used for diagnostic tools such as TDUMP. Borland C++ Version 4.0 emits version 4.01 debugging information. The record contains two fields: * Major byte Major version of the debug information. * Minor byte Minor version of the debug information. 0xfa Module optimization flags Thi record presents the module optimization flags previously described. The compiler is responsible for emitting flags, which the linker passes unchanged to the debug information in the .EXE file. * dword Optimization flags. Chapter 2 Page 47 The following flags are currently defined: #define MO_globalCSEs 0x0001 #define MO_localCSEs 0x0002 #define MO_inductVars 0x0004 #define MO_codeMotion 0x0008 #define MO_regAlloc 0x0010 #define MO_loadOptim 0x0020 #define MO_loopOpt 0x0040 #define MO_intrinsics 0x0080 #define MO_deadStorElim 0x0100 #define MO_copyProp 0x0200 #define MO_jumpOpt 0x0400 #define MO_speed_size 0x0800 #define MO_noAliasing 0x1000 .OBJ extensions for 32 bits The .OBJ spec was originally designed for the 16-bit world. Fortunately, its designers only allotted even numbered record types to the standard 16-bit records. The following extension uses the odd numbered record types to represent the 32-bit equivalents where needed. SEGD32 (99h) Size field is 32 bits. LEDA32 (A1h) Offset field is 32 bits. LIDA32 (A3h) Offset field is 32 bits, iteration count fields are 32 bits. PUBD32 (91h) Offset field is 32 bits. MODE32 (8Bh) Starting offset field is 32 bits. LINN32 (95h) Offset is 32 bits. FIXU32 (9Dh) Offset and displacement are 32 bits. In the SEGDEF and SEGD32 records, the ACBP byte is redefined as follows: Bit 0 (formerly InPage) now means USE32 when set. The align types are extended to include DWORD alignment after PAGE alignment. This specification can be extended to include other record types, as needed. The 16-bit equivalent of any record can be used until one or more fields exceed the 16-bit size limitation. TASM uses such a minimalist approach in generating records to save space. Chapter 2 Page 48 VIRDEF Records The following modified record is provided for the linker to support unique instantiation of virtual tables, "out of line inlines" and various thunks the compiler generates. The mechanism is called "" for and it is similar to an initializable COMDEF. It begins with a change to the . A is identical to a COMDEF record with the exception that the "segment type" must be a number in the range 1..0x5F (instead of the 0x61 and 0x62 far and near COMDEF types); it is to be interpreted as a segment index, and may refer to any SegDef in the current module, with the meaning that the VIRDEF is to be appended to that segment IF it is instantiated; the record format is like that for a near COMDEF, with a single length count. The VIRDEF defines both a Public name and an External Index in the same way as a COMDEF does. VIRDEFs cannot be resolved onto a Public or a COMDEF of the same name: any attempt to mix will be a link time error. All VIRDEFs of the same name will be taken to be identical. When all sources files have been read and the linker has decided which modules are to be kept and which modules are to be discarded it scans the list of instances of each VIRDEF. It ignores instances which are in discarded modules, and selects the instance which is the first of the largest instances (or the first if all are equal in size). That instance is updated as the actual public symbol. Its segment is chosen (in the case where the VIRDEFs do not all attach to the same segment) and its module is noted. Only the LEDATA records from that module will be used, the others will be ignored. VIRDEFs may be attached to either data or code segments. If a uniform choice of segment is not made and the code generated to reference the VIRDEF cannot reach the target then it generates fixup overflows in the usual way: it is not an error to have a single name of VIRDEF with Chapter 2 Page 49 different segments unless it results in overflows. A COMDEF may be seen as a "special case" of a VIRDEF, one which is attached to either BSS or an invented FAR segment, and which is never initialized with LEDATA. When a reference is made to a VIRDEF from other object file records, the index that refers to the VIRDEF will be greater than 0x4000. To use the index, subtract 0x4000, and use it as a normal index. These changes will not be compatible with Microsoft's LINK but only occur in C++ code. Chapter 2 Page 50 CHAPTER _________________________________________________ 3 Symbol table format TLINK's debugging output is written at the end of the load image in the .EXE file. An image that does not include extra information beyond the image size has no debug information. If extra data is written beyond the load image, check the first word for the number 0x52fb. The debug information begins with a header describing the sizes of the remaining tables. This header is defined as follows: struct debug_header { unsigned short magic_number; /* To be sure who we are */ unsigned short version_id; /* In case we change things */ unsigned long names; /* Names pool size in bytes */ unsigned long names_count; /* Number of names in pool */ unsigned long types_count; /* Number of type entries */ unsigned long members_count; /* Structure members table */ unsigned long symbols_count; /* Number of symbols */ unsigned long globals_count; /* Number of global symbols */ unsigned long modules_count; /* Number of modules (units)*/ unsigned long locals_count; /* optional; can be filler*/ unsigned long scopes_count; /* Number of scopes in table*/ Chapter 3 Page 51 unsigned long lines_count; /* Number of line nos */ unsigned long source_count; /* Number of include files */ unsigned long segment_count; /* number of segment records*/ unsigned long correlation_count;/* number of segment/file */ /* correlations */ unsigned long image_size; /* The number of bytes in */ /* the .EXE file if the */ /* uninitialized part of */ /* the data, plus this */ /* debug info were removed. */ void far *debugger_hook; /* A far ptr into debugged */ /* program, meaning depends */ /* on program flags. For pascal */ /* overlays, is ptr to start of */ /* data area that contains info */ /* contains about the overlays. */ unsigned char program_flags; /* A byte of flags */ /* 0x01 = Case sensitive link */ /* 0x00 = Case insensitive link */ /* 0x02 = pascal overlay program*/ unsigned stringsegoffset; /* No longer used */ unsigned short data_count; /* size in bytes of data pool */ unsigned char filler; /* to force alignment */ unsigned short extension_size; /* 0, or 16, for now */ }; struct header_extension { Chapter 3 Page 52 unsigned long class_entries; /* number of classes */ unsigned long parent_entries; /* number of parents */ unsigned long global_classes; /* number of global classes */ /* - NOT USED */ unsigned long scope_class_entries; /* number of scope classes */ unsigned long module_class_entries; /* number of module classes / unsigned long CoverageOffsetCount; /* number of coverage offsets*/ unsigned long NamePoolOffset; /* offset to start of name * /* pool. This is relative */ /* to the symbols base */ unsigned long BrowserEntries; /* number of browser info recs */ unsigned long OptSymEntries; /* number of opt symbol recs */ unsigned int DebugFlags; /* various flags */ unsigned long refInfoSize; /* size in bytes of ref */ /* info section */ char filler [14]; /* padding */ }; typedef struct /* Trailer at end of NEW EXE with debug info */ { unsigned short Signature; /* 'NB' */ unsigned short Version; /* MS debug info version number */ unsigned long Size; /* Codeview header offset = */ /* (EOF - Size) */ } TMSDbgTrailer; The layout appears in the .EXE files as follows: EXE header fixups EXE image Chapter 3 Page 53 debug header Symbol Table Module Table Source File Table Scopes Table Line Number Table Segments Table Correlation Table Type Table Members Table Class Table Parent Table Scope Class Table Module Class Table Coverage Map Table Coverage Offsets Table Browser Definitions Table Optimized Symbols Table Module Optimization Flags Table Reference Information Table Names Table For new .EXE files, there will be an 8-byte Codeview header immediately before the debug header, and an 8-byte Codeview trailer immediately after the names table. TD symbols tables can be told apart from Microsoft-generated tables by the value 0xFFFFFFFF in the last 4 bytes of the Codeview header. All symbols, global or not, appear in the symbols area. The globals appear first, with module and local symbols following. The globals field specifies how many of the symbols are globals. Identifiers are stored as indexes into the names pool. The index is to the relative identifier number (starting at 1). This way 64K distinct identifiers of arbitrary length can be stored. Names are stored uniquely, so that comparing indexes is as good as comparing strings. An identifier is stored in the pool as an ASCIIZ string (null-terminated string). Symbols struct symbol_record { unsigned long symbol_name; unsigned long symbol_type; unsigned short symbol_offset; Chapter 3 Page 54 unsigned short symbol_segment; unsigned short symbol_class : 3; unsigned short has_valid_BP : 1; unsigned short return_address_word_offset : 3; }; The symbol table consists of a series of symbol definitions, sorted into ascending address order, with constant symbols (symbol_class == 5) at the end of each section (global or module local). Note also that globals are all static, absolute, or typedefs. No register globals are generated by Borland compilers at this time. symbol_name is the index of the symbol name. symbol_type is the index of the symbol type. symbol_offset is interpreted according to the symbol_class field. symbol_segment is the segment part of the symbol address for static symbols. For new .EXE files, the top two bits of symbol_segment are used to provide information about symbols in DLLs as follows: If SR_SS_DllEntry bit is non-zero, then SR_SS_OrdinalFlag determines whether or not the SR_SS_Ordinal field of symbol_segment is an ordinal value or not. For DLLs, symbol_offset is the name index of the module and symbol_name is name index of the DLL's entry point. symbol_class is one of the following: _________________________________________________ Value Symbol class __________________________ 0x0 Static, offset and segment give the address. 0x1 Absolute symbol. The segment and offset is the absolute address of the symbol. 0x2 Auto, offset is treated as signed, relative to BP. Chapter 3 Page 55 0x3 Pascal var parameter. The offset is BP relative and is the location of the far pointer to the parameter. 0x4 Register. Offset is a register ID as follows: 0x00 AX 0x0A DL 0x14 FS 0x20 ST(0) 0x01 CX 0x0B BL 0x15 GS 0x21 ST(1) 0x02 DX 0x0C AH 0x18 EA 0x22 ST(2)X 0x03 BX 0x0D CH 0x19 EC 0x23 ST(3)X 0x04 SP 0x0E DH 0x1A ED 0x24 ST(4)X 0x05 BP 0x0F BH 0x1B EB 0x25 ST(5)X 0x06 SI 0x10 ES 0x1C ES 0x26 ST(6)P 0x07 DI 0x11 CS 0x1D EB 0x27 ST(7)P 0x08 AL 0x12 SS 0x1E ESI 0x09 CL 0x13 DS 0x1F EDI 0x5 Constant. Up to 4-byte constant stored in offset/segment. 0x6 Typedef. The offset field is ignored. 0x7 Structure/Union/Enum Tag. The offset is a type index. ______________________ #define SC_STATIC 0x0 #define SC_ABSOLUTE 0x1 #define SC_AUTO 0x2 #define SC_PASVAR 0x3 #define SC_REGISTER 0x4 #define SC_CONST 0x5 #define SC_TYPEDEF 0x6 #define SC_TAG 0x7 #define SR_SS_DllEntry 0x8000 /* symbol is a dll entry */ #define SR_SS_OrdinalFlag 0x4000 /* segment is ordinal value */ #define SR_SS_Ordinal 0x3fff /* mask to obtain ordinal value */ Chapter 3 Page 56 The has_valid_BP field is defined for functions only. If the bit is zero, the function does not set up a BP stack frame, if the value is one then a valid BP is set up. The return_address_word_offset field contains the offset in words from BP where the return address can be found if the has_valid_BP field is not zero. The size of the return address is determined from the function type. Modules A module (or unit) consists of a set of objects, source files, and correlation records. struct module_header { unsigned long module_name; unsigned char language; unsigned short memory_model : 3; unsigned short underbars_on : 1; unsigned long symbols_index; unsigned short symbols_count; unsigned short source_files_index; unsigned short source_files_count; unsigned short correlation_index; unsigned short correlation_count; }; #define MM_TINY 0x0 #define MM_SMALL 0x1 #define MM_MEDIUM 0x2 #define MM_COMPACT 0x3 #define MM_LARGE 0x4 #define MM_HUGE 0x5 #define MM_SMALL386 0x6 #define MM_MEDIUM386 0x7 #define MM_COMPACT386 0x8 #define MM_LARGE386 0x9 module_name is the index of the module's name. This name is the source file name given to the compiler, including the extension. symbols_index is the index of the first symbol in the symbol table for the module. symbols_count is the number of symbols defined local to the module. Chapter 3 Page 57 source_files_index is the index of the first source file record for the module. source_files_count is the number of source files in the module. correlation_index is the index of the correlation record for the module. correlation_count is the number of correlation entries in the module. language indicates the source language for the module. _________________________________________________ Value Language _________________________________ 0 Unknown 1 C 2 Pascal 3 Basic (not used) 4 assembly language 5 C++ ______________________________________ memory_model determines default pointer sizes in type conversions. underbars_on is non-zero if underbars should be prepended for cdecl-style symbols in any search context in this module. Source files struct source_file { unsigned long source_file_name; unsigned long time_stamp; }; Each source file with line numbers in the executable code will have a source file record in the list module source files. There will always be at least one source file record per module (assuming there is any executable code in the module). Each include file containing code will generate a single source-file record per inclusion. Chapter 3 Page 58 The line numbers for a segment within a source file will appear as a block in the line number table. The source files in a module will appear in the order of their appearance in the compilation process. Thus the main source file appears first, followed by each of the include files. Note that if an include file doesn't have executable code (and therefore no source line numbers), it shouldn't be included here. Thus, for most source files with no code in include files, there will be only one file entry per module. Of course, if no executable code appears in a module, there is no need for a source file record. The source file name will include any subdirectory information. Thus, if Turbo Debugger is run in the source directory (or with the source directory given in the appropriate TD option), it should be able to find all the source, even if it originated from some other source or had some peculiar file-name extension. For include files, the actual path name used to open the file is used. This way the debugger doesn't duplicate the compiler's include directory search logic. The date/time stamp determines if the source file has changed since the time of the link. Line numbers struct line_number { unsigned short line_number_value; unsigned short line_number_offset; }; line_number_value is the module line number. line_number_offset is the offset of the line number relative to the segment value stored in the segment record referred to in the active correlation record. Only unique offsets have line numbers stored. When a statement spans several lines, there can be two line records with the same offset, but different line numbers. Chapter 3 Page 59 The line number records are address sorted; they are not necessarily line-number ordered. Scopes struct scope { unsigned long autos_index; unsigned short autos_count; unsigned short parent_scope; unsigned long function_symbol; unsigned short scope_offset; unsigned short scope_length; }; autos_index and autos_count define the symbol table area containing this scope's symbols. The auto_start is the index into the symbols table of the first variable local to the scope. parent_scope is the index of the scope within the current module of the immediate enclosing scope. scope_offset and scope_length defines the ranges of code addresses the scope is valid for. The segment is that stored in the segment record referred to in the active correlation record. To handle nested units in pascal, there is a set of scopes at the beginning of the scopes table with a function_symbol of 0xffff. There is a one-to-one correspondence between these and the module (unit) records. These are the "unit scopes." The symbols that the record points to are the interfaced symbols of the unit. The "uses scope" record has a function_parent of 0xfffe to establish the correct linking between the unit scope records. It does not contain information about the scope's symbols. Instead, autos_index is an index to the unit scope record that refers to the interfaced symbols. To look up a name, the scopes are traced using the scope_parent records, but the symbols are accessed by referring to the corresponding unit scope record. Segments typedef struct /* segment info */ Chapter 3 Page 60 { unsigned short mod_index; unsigned short code_segment; unsigned short code_offset; unsigned short code_length; unsigned short scopes_index; unsigned short scopes_count; unsigned short correlation_index; unsigned short correlation_count; } segrec; A segment record gives a code segment, offset, and length, and relates it to a particular module. It also gives an index into the scopes table for the scopes defined in the segment. The correlation table index and count allow the segment to be related to one or more source files and possibly to non-continuous groups of lines inside the files. The segment records are address-ordered by segment and then by offset within the segment. mod_index is the index of the module record for the corresponding module. code_segment is the base address of the segment in the image. code_offset is the offset from the base address of the segment in the image. code_length is the length of the segment. scopes_index is the index of the scope record of the starting scope for this segment. scopes_count is the count of scopes for this segment. correlation_index is the index of the correlation record for the starting correlation for this segment. correlation_count is the number of correlation records for this segment. Segment/source file correlations These records link a range of line numbers in a file to a particular segment record. Chapter 3 Page 61 typedef struct { unsigned short segment_index; unsigned short file_index; unsigned long lines_index; unsigned short lines_count; } correlation; segment_index is the index of the segment record for this correlation. file_index is the index of the source file record for this correlation. lines_index is the index of the first line number record for this correlation. lines_count is the number of line number records for this correlation. Types The type table consists of a set of 12-byte entries. Each type contains one or (for a few types) two entries. The index value is used when a type is referred to. Since no operations need to search the type table itself (all accesses will use index numbers), any type that occupies more than one entry will not have a type id byte for the upper half. Thus type records are effectively either 8- or 16-bytes long, depending on the particular type. Also, since only two sizes are present, a program can treat the table as effectively as a table of fixed size objects. Simple types and common fields The fields in the following table are common to all types. _________________________________________________ Field Size Offset ______________ type_id 1 0 type_name 4 1 type_size 2 5 ________________ type_name is 0 if the type is unnamed or is the name index of the type name. Chapter 3 Page 62 type_size is the size in bytes of the object. This field is present in all type records. type_id values are #define TID_VOID 0x00 /* Unknown or no type */ #define TID_LSTR 0x01 /* Basic Literal string */ #define TID_DSTR 0x02 /* Basic Dynamic string */ #define TID_PSTR 0x03 /* Pascal style string */ _________________________________________________ Pascal strings (12 bytes) Field Size Offset ______________ max_size 1 7 ________________ #define TID_SCHAR 0x04 /* 1 byte signed range */ #define TID_SINT 0x05 /* 2 byte signed range */ #define TID_SLONG 0x06 /* 4 byte signed range */ #define TID_SQUAD 0x07 /* 8 byte signed int */ #define TID_UCHAR 0x08 /* 1 byte unsigned range */ #define TID_UINT 0x09 /* 2 byte unsigned range */ #define TID_ULONG 0x0A /* 4 byte unsigned range */ #define TID_UQUAD 0x0B /* 8 byte unsigned int */ #define TID_PCHAR 0x0C /* Pascal character type */ _________________________________________________ Ranges (24 bytes) Field Size Offset ______________ parent type 2 8 lower bound 4 12 upper bound 4 16 ________________ #define TID_FLOAT 0x0D /* IEEE 32-bit real */ #define TID_TPREAL 0x0E /* Turbo Pascal 6-byte real */ Chapter 3 Page 63 #define TID_DOUBLE 0x0F /* IEEE 64-bit real */ #define TID_LDOUBLE 0x10 /* IEEE 80-bit real */ #define TID_BCD4 0x11 /* 4 byte BCD */ #define TID_BCD8 0x12 /* 8 byte BCD */ #define TID_BCD10 0x13 /* 10 byte BCD */ _________________________________________________ BCD COBOL (12 bytes) Field Size Offset ___________ decimal point 1 5 _____________ #define TID_BCDCOB 0x14 /* COBOL BCD */ _________________________________________________ Pointers (12 bytes) Field Size Offset ___________ extra info 1 7 pointed-to type 4 8 _____________ #define TID_NEAR 0x15 /* Near pointer */ #define TID_FAR 0x16 /* Far pointer */ #define TID_SEG 0x17 /* Segment pointer */ #define TID_NEAR386 0x18 /* 386 32-bit offset ptr*/ #define TID_FAR386 0x19 /* 386 48-bit far ptr */ _________________________________________________ C arrays (12 bytes) Field Size Offset ___________ element type 4 8 _____________ #define TID_CARRAY 0x1A /* C array - 0 based */ Chapter 3 Page 64 _________________________________________________ Very large arrays (12 bytes) Field Size Offset ___________ object size 2 7 element type 4 9 _____________ #define TID_VLARRAY 0x1B /* Very Large 0 based array */ Pascal arrays______________________________ (24 bytes) Field Size Offset ___________ element type 4 8 dimension type 4 12 _____________ #define TID_PARRAY 0x1C /* Pascal array */ Structs and unions______________________________ (12 bytes) Field Size Offset ___________ members index 4 8 _____________ #define TID_ADESC 0x1D /* Basic array descriptor */ #define TID_STRUCT 0x1E /* Structure */ #define TID_UNION 0x1F /* Union */ Very large structs______________________________ and unions (24 bytes) Field Size Offset ___________ object size 2 7 members index 4 9 _____________ #define TID_VLSTRUCT 0x20 /* Very Large Structure */ #define TID_VLUNION 0x21 /* Very Large Union */ Chapter 3 Page 65 _________________________________________________ Enums (24 bytes) Field Size Offset ___________ lower bound 2 12 upper bound 2 14 members index 4 16 ____________ #define TID_ENUM 0x22 /* Enumerated range */ Functions______________________________ (12 bytes) Field Size Offset ___________ language 0:7 7:0 * accepts var. args. 0:1 7:7 return type 4 8 * These should be read as byte:bit ________________ #define TID_FUNCTION 0x23 /* Function or procedure*/ Labels (12 bytes)______________________________ Field Size Offset ___________ near/far 1 7 _____________ #define TID_LABEL 0x24 /* Goto label */ Sets (12 bytes)______________________________ Field Size Offset ___________ parent type 4 8 _____________ #define TID_SET 0x25 /* Pascal set */ Binary files______________________________ (12 bytes) Field Size Offset ___________ element type 4 8 _____________ Chapter 3 Page 66 #define TID_TFILE 0x26 /* Pascal text file */ #define TID_BFILE 0x27 /* Pascal binary file */ Function prototypes______________________________ (24 bytes) Field Size Offset ___________ language 0:7 7:0 * accepts var. args. 0:1 7:7 return type 4 8 parameter start 2 12 * These should be read as byte:bit ________________ #define TID_BOOL 0x28 /* Pascal boolean */ #define TID_PENUM 0x29 /* Pascal enum */ #define TID_PWORD 0x2A /* pword (6 byte 386 ptr) */ #define TID_TBYTE 0x2B /* tbyte */ #define TID_FUNCPROTOTYPE 0x2C /* Function with full parameter information. */ The language field is as follows: _________________________________________________ Value Description __________________________ 0x0 Near C function 0x1 Near Pascal function 0x2 Unused 0x3 Unused 0x4 Far C function 0x5 Far Pascal function 0x6 Unused 0x7 Interrupt function ___________________ Special functions______________________________ (24 bytes) Field Size Offset __________ language 1 7 return type 4 8 class type 4 12 virtual offset 2 16 Chapter 3 Page 67 symbol index 4 18 info bits 1 22 ____________ class type is type index of class. virtual offset is offset into the virtual table. symbol index is the symbol index of this method. info bits are described in the following table. _________________________________________________ Value Description ______________________________ 0x01 member function 0x02 duplicate function 0x04 operator function 0x08 internal linkage 0x10 Pascal function passing 'this' as last parameter ________________________________________ /* Special function for methods and duplicate functions. */ #define TID_SPECIALFUNC 0x2D Classes (12 bytes)______________________________ Field Size Offset __________ class index 4 8 ____________ #define TID_CLASS 0x2E /* Class */ Member pointers (24______________________________ bytes) Field Size Offset __________ type index 4 8 class index 2 11 ____________ /* TID's 2F , 31-32 unused */ #define TID_HANDLEPTR 0x30 /* Handle-based pointer NOT USED*/ #define TID_MEMBERPTR 0x33 /* Member pointer */ #define TID_NEWMEMPTR 0x38 /* New style member pointer */ TID_MEMBERPTR____________________________________ Field Size Offset __________ Chapter 3 Page 68 type index 4 8 base class index 2 12 ____________ TID_NEWMEMBERPTR _________________________________ Field Size Offset __________ member ptr flags 1 7 pointer to type index 4 8 base class index 2 11 ____________ TID_HANDLEPTR ____________________________________ Field Size Offset __________ extra info byte 1 7 handle string index 4 8 type index 4 12 ____________ Near and far______________________________ references (24 bytes) Field Size Offset __________ type index 4 8 class index 4 12 ____________ #define TID_NREF 0x34 /* Near reference pointer*/ #define TID_FREF 0x35 /* Far reference pointer*/ #define TID_WORDBOOL 0x36 /* Pascal word boolean */ #define TID_LONGBOOL 0x37 /* Pascal long boolean */ #define TID_GLOBALHANDLE 0x3E /* Windows global handle */ #define TID_LOCALHANDLE 0x3F /* Windows local handle */ /* These can be used to cast a type_rec pointer to the appropriate subtype */ #define _t_pstr(x) (((struct type_rec *)(x))->v.pstr) #define _t_range(x) (((struct type_rec *)(x))->v.range) #define _t_bcd(x) (((struct type_rec *)(x))->v.bcd) Chapter 3 Page 69 #define _t_ptr(x) (((struct type_rec *)(x))->v.ptr) #define _t_seg(x) (((struct type_rec *)(x))->v.seg) #define _t_carray(x) (((struct type_rec *)(x))->v.carray) #define _t_vlarray(x) (((struct type_rec *)(x))->v.vlarray) #define _t_parray(x) (((struct type_rec *)(x))->v.parray) #define _t_struct(x) (((struct type_rec *)(x))->v.struc) #define _t_vlstruct(x) (((struct type_rec *)(x))->v.vlstruct) #define _t_enumty(x) (((struct type_rec *)(x))->v.enumty) #define _t_function(x) (((struct type_rec *)(x))->v.function) #define _t_set(x) (((struct type_rec *)(x))->v.set) #define _t_bfile(x) (((struct type_rec *)(x))->v.bfile) #define _t_label(x) (((struct type_rec *)(x))->v.label) #define _t_specfunc(x) (((struct type_rec *)(x))->v.specfunc) #define _t_class(x) (((struct type_rec *)(x))->v.class) #define _t_memberptr(x) (((struct type_rec *)(x))->v.memberptr) struct type_rec { unsigned char type_id; /* The TID byte. */ unsigned long type_name; /* Any associated type name. */ unsigned short type_size; /* The size of any object */ /* of this type. */ union { /* For TID_VOID, TID_LSTR, TID_DSTR, TID_SQUAD, TID_UQUAD, TID_FLOAT, TID_PREAL, TID_DOUBLE, TID_LDOUBLE, TID_BCD4, TID_BCD8, TID_BCD10, TID_ADESC, TID_LABEL, TID_TFILE, TID_BOOL, Chapter 3 Page 70 TID_PWORD, TID_TBYTE types, no additional info. */ struct { /* only for TID_PSTR */ unsigned char max_size; /* Max string size */ } pstr; /*^L*/ struct { /* for TID_PCHAR, TID_SCHAR, TID_SINT, TID_SLONG, TID_UCHAR, TID_UINT and TID_ULONG types */ unsigned char filler; unsigned long parent; /* Parent type */ long lower; /* Minimum value */ long upper; /* Maximum value */ } range; struct { /* for TID_BCDCOB only */ unsigned char decimal; /* Number of digits to */ /* right of decimal point. */ } bcd; struct { /* TID_LABEL only */ unsigned char nearfar; /* 0 for near, 1 for far */ } label; struct { /* for TID_NEAR, TID_FAR, TID_NEAR386, TID_FAR386 */ unsigned char extra_info; /* as follows: */ unsigned long type_index; /* pointed-to type */ } ptr; /* For TID_NEAR and TID_NEAR386: 0x0 segment register unspecified. Chapter 3 Page 71 0x1 ES relative 0x2 CS relative 0x3 SS relative 0x4 DS relative 0x5 FS relative 0x6 GS relative For TID_FAR and TID_FAR386: 0x0 far arithmetic. 0x1 huge arithmetic (real mode only). */ struct { /* For TID_SEG, TID_NREF, TID_FREF */ unsigned char filler; unsigned long type_index; /* pointed-to type */ } seg; struct { /* For TID_CARRAY only */ unsigned char filler; unsigned long element; /* Element type */ } carray; struct { /* For TID_VLARRAY only */ unsigned short upper_size; /* Upper 16 bits of size */ unsigned long element; /* Element type */ } vlarray; struct { /* For TID_PARRAY only */ unsigned char filler; unsigned long element; /* Element type */ unsigned short dimension; /* Subscript type */ } parray; struct { /* For TID_STRUCT and TID_UNION */ Chapter 3 Page 72 unsigned char filler; unsigned long members; /* Index of members */ } struc; struct { /* For TID_VLSTRUCT and TID_VLUNION */ unsigned short upper_size; /* Upper 16 bits of size */ unsigned long members; /* Index of members */ } vlstruct; struct { /* For TID_ENUM and TID_PENUM */ unsigned char filler; unsigned short parent; /* type of parent */ unsigned char filler1; unsigned char filler2; unsigned short lower; /* Bottom of range */ unsigned short upper; /* Top of enum range*/ unsigned long members; /* Index of members */ } enumty; struct { /* For TID_FUNCTION only */ unsigned language : 7; unsigned is_varargs : 1; /* Accepts Var args */ unsigned long return_type; } function; /* The language field is as follows: 0x0 Near C function 0x1 Near Pascal function 0x2 Unused. 0x3 Unused. 0x4 Far C function 0x5 Far Pascal function 0x6 Unused. 0x7 Interrupt function */ Chapter 3 Page 73 struct { /* For TID_FUNCPROTOTYPE only */ unsigned language : 7; /* see TID_FUNCTION */ unsigned is_varargs : 1; /* Accepts Var args */ unsigned long return_type; unsigned short param_start; /* starting index */ /* in members table */ } funcprototype; struct { /* For TID_SET only */ unsigned char filler; unsigned long parent; /* Parent type */ } set; struct { /* For TID_BFILE only */ unsigned char filler; unsigned short element; /* File element type*/ } bfile; struct { /* For TID_SPECIALFUNC only */ unsigned char language; unsigned long return_type; unsigned long class_type; unsigned short virtual_offset; /* in bytes */ unsigned long symbol_index; unsigned int filler :12; unsigned int info_bits :4; } specfunc; struct { /* For TID_CLASS only */ unsigned char filler; unsigned short class_index; } class; struct { /* For TID_MEMBERPTR */ Chapter 3 Page 74 unsigned char filler; unsigned long type_index; unsigned short class_index; } memberptr; } v; }; Members The members table holds two completely distinct kinds of information. Structures and unions point into this table for their lists of members. Enums store their list of name/value pairs here. Structure and union members struct struct_offset_rec { unsigned filler : 6; unsigned offset_rec : 1; unsigned filler2 : 1; unsigned long new_offset; }; /* The new_offset is the offset for the next member. */ struct member_type { unsigned bit_field_size : 6; unsigned offset_rec : 1; unsigned end_of_structure: 1; unsigned long member_name; unsigned long member_type; }; /**************************************** The member_name is the index of the name. The member_type is the index of the type. ****************************************/ struct enum_list_type { unsigned filler : 7; unsigned end_of_list : 1; unsigned long enum_name; signed short enum_value; }; end_of_list is 1 for the last enum value in the list. enum_name is the index of the name. Chapter 3 Page 75 enum_value is the value of the corresponding name. typedef union { struct struct_offset_rec o; struct member_type m; struct enum_list_type e; } member_rec; bit_field_size is only important for bit field members. It is the size in bits of the member. For non-bit field members, the bit_field_size is 0. offset_rec is zero for normal members, and non- zero for the special struct-offset record. If this bit is set, the next 2 bytes of the member record is a word holding the new structure offset in bytes. This is used for Pascal variant records. end_of_structure is 1 for the last field in a structure. This is the sign bit, so a simple negative/non-negative test will determine the end of the structure. Holes in the structure (due to alignment padding) are represented using an unnamed bit-field member with a zero name index and a zero type index. The offsets of union members are always zero. The offsets of structure members are computed from the sequence of the members in the table. The members are stored in ascending offset order. For a nested unnamed union inside a structure or an unnamed structure inside a union, these will appear as unnamed members. The debugger unravels this nesting to provide functionality to support unnamed structure/union members. Class table typedef struct { unsigned short parent_index; /* index into parent table */ unsigned short parent_count; unsigned long member_index; unsigned long name_index; /* tag */ unsigned short virtual_ptr; /* Offset from top of class data of Chapter 3 Page 76 virtual ptr*/ unsigned char info; /* Info bits: bit 0: Class is a virtual base class bit 1: Class is public bit 2-7: Offset of method in virtual table */ } class; The class table defines the inheritance characteristics for each class. If a derived class has multiple inheritance, there will be multiple entries in the class table, indicating different parent classes. If there are several classes derived from the same virtual base class, there will be separate class table entries for each virtual base class, and each base class entry will have the same symbol index. The first byte of the member record for a given class entry indicates the size of bitfields, and as a set of bits to indicate member attributes. These bits can be OR'd together to form the desired attribute. _________________________________________________ Value Member attributes _______________________ 0x80 Last member 0x60 Static member (member_type points to symbol for the member) 0x50 Static member function 0x48 Method or member function (including virtual and static methods) 0x44 Virtual method 0x42 Constructor 0x41 Destructor ______________________________ For example, a virtual destructor will have a value of 0x4D: 0x48 - method bit & 0x44 - virtual bit & 0x41 - destructor bit ---- 0x4D Chapter 3 Page 77 Special cases If member_record == 0x40, record is a reset offset record. If member_record == 0xc0, next record is a bitfield (only needed when bitfield has some of the previous attributes. Attributes are indicated in this preceding record so the first byte is free to indicate field length in the bitfield record.) If member_record == 0x43, record is a conversion method. If member_record == 0x80 and member_name == 0 and member_type == 0, then the Turbo Pascal linker has smart linked this class away. Non-static, non-bitfield data members are always 0, or 0x80 if they're the last item. Bit combining doesn't apply to constructors, destructors and conversions bits, since they are mutually exclusive. Parent table Each entry in the parent table has the following format: typedef struct { unsigned short class_index; /* index into class table */ } parent; class_index is an index into the class table. If the highest bit is set, this parent is a virtual base class. Scope class table typedef struct { unsigned short class_index; /* index into class table */ unsigned short class_count; /* number of classes */ } scope_class; Chapter 3 Page 78 A scope class table finds the classes defined within a particular scope. If any scope class records are needed, there must be one record for each scope record. This is identical to expanding the current scope record to contain the following fields, but it maintains backward compatibility with the earlier table, and allows non-object languages to avoid the overhead of bigger scope records. Module class table typedef struct /* local classes */ { unsigned short class_index; /* index into class table */ unsigned short class_count; /* number of classes */ } module_class; A module class table finds the classes and overloads defined within a particular module. If any module class records are needed, there must be one for each module record. This is identical to expanding the current module record to contain the following fields, but it maintains backward compatibility with the earlier table, and allows non-object languages to avoid the overhead of bigger module records. Coverage offset map table typedef struct { unsigned short offset; /* index into Coverage Offset Table */ } TCoverageOffsetMapTableEntry; This table defines the starting index into the coverage offset table (which follows) for the given segment. There are as many segment entries as there are segments in the segment table. This table can be viewed as an array of TCoverageOffsetMapRecord entries, with the number of entries the same as the number of segments records in the segment table. Entries with an index of 0 indicate that lack of coverage offsets for the given segment. Note that the values in Chapter 3 Page 79 this table are not necessarily in ascending order. Coverage offset table typedef struct { unsigned short offset; /* offset into segment */ } TCoverageOffsetTableEntry; Each entry in the table corresponds to a starting offset for a block of code that is "atomic," meaning that if you start executing at the beginning of the block, you are guaranteed to reach the end. Browser definition table struct TDefinitionRecord { unsigned long symbol_index; /* The index of the symbol in */ /* the Symbols table */ unsigned short file_index; /* Which file the symbol is in */ unsigned short line_number; /* line number in the file */ }; Optimized symbol table struct opt_symbol_record { unsigned short opt_symbol_next; /* index to next record for this symbol */ unsigned short opt_symbol_offset; /* offset is treated as a register enum */ /* See the Symbols section for details */ unsigned char opt_symbol_class; /* Interpreted as for symbol_record */ unsigned short opt_symbol_code_offset_start; /* start of optimization range */ Chapter 3 Page 80 unsigned short opt_symbol_code_offset_end; }; /* end of optimization range */ An has an entry in the symbols table whose type is SC_REGISTER (0x4), but whose register ID (offset) is greater than or equal to 0x28. The register ID (minus 0x28) is an index into the optimized symbols table. The at that index is the first record in a linked list of records, linked through the opt_symbol_next field. The end of the list is marked by a 0 in that field. This record will have accurate information as to the true location of the variable in the opt_symbol_offset and opt_symbol_class fields, as per the symbol_record specification. Note that opt_symbol_class refers to the combination of the three symbol record bit fields: symbol_class, has_valid_BP, and return_address_word_offset. The reason there is a list of opt_symbol_record objects is that a variable may exist in a register for some period of time, and then be "spilled" to a memory location, and possibly later reloaded into another register. Module Optimization Flags Table, Reference Information Table The DebugFlags field in the debug header extension currently have only one bit defined: #define DBG_OPT 0x0001 If this bit is set, then the application has optimized code somewhere in its modules. The ModuleFlags table contains a dword entry of flags for each module in the Module table. It is indexed by the same module index that is used to index the module table. Note that the optimizations performed may be different than the optimizations requested when the module was compiled. Each word currently describes the sorts of optimizations the compiler has done to the module. The following bits are defined: #define MO_globalCSEs 0x0001 #define MO_localCSEs 0x0002 #define MO_inductVars 0x0004 Chapter 3 Page 81 #define MO_codeMotion 0x0008 #define MO_regAlloc 0x0010 #define MO_loadOptim 0x0020 #define MO_loopOpt 0x0040 #define MO_intrinsics 0x0080 #define MO_deadStorElim 0x0100 #define MO_copyProp 0x0200 #define MO_jumpOpt 0x0400 #define MO_speed_size 0x0800 #define MO_noAliasing 0x1000 If the dword is 0, then the module contains no optimized code. Reference Information Table Names Any symbolic name encountered in the symbol tables is referenced via an index into this region. Each identifier is stored with a trailing null byte. Debugging Turbo Pascal overlays Data at address pointed to by debugger_hook: typedef struct { unsigned short overlay_list; /* start of linked list of overlay */ /* header segs */ unsigned short overlay_size; /* smallest overlay buffer that */ /* can be used */ void far * debugger_hook; /* ptr to routine in debugger */ } overlay; A debugger must fill in debugger_hook after loading the program. debugger_hook is called by the overlay manager after any overlay is loaded. The allows the debugger to set in the newly loaded segment. When called, ES contains the base segment of the overlay header BX contains the offset that the overlay manager will jump to in the newly loaded code. (This is useful if an int 3F has been traced--an int 3f is followed by data and is not returned.) Chapter 3 Page 82 The actual segment of a particular overlaid segment is at offset 10h in the overlay header. If this value is zero, then the segment is not loaded. Data objects in an overlaid segment will contain the segment of the overlay header and the true offset in the code segment. Chapter 3 Page 83 Chapter 4 Page 84 CHAPTER _________________________________________________ 4 Project file format You can view a project file directly with a debugger or binary editor but the Project File utilities make it a lot easier to understand and work with. This chapter describes the utilities and gives information for the Turbo C++ and Borland C++ project file format. The format is current as of Project file version 0x0701. Project file utilities How the utilities work Using object oriented technology, the online utilities provide access to project (.PRJ) files produced by Turbo C++ and Borland C++. The examples PROX, STRIPPRJ, and TRANCOPY show how you can see and change project files without needing to learn how the data is organized. Two basic classes access the project files. TFileClass gets to files on disk (see fileclas.h and FILECLAS.CPP). TSection and descendants encapsulate each section of a project (see prjclass.h and PRJCLASS.CPP). A project can be divided into seven discrete sections, each storing different information. PROX defines them as classes. For example, TOptionSection contains the settings of many options, such as Options|Compiler|Code generation|Model. Here's the TSection class hierarchy and contents: Chapter 4 Page 85 TSection ÂÄÄÄÄÄÄÄ ÃÄTOptionSection Compiler, linker, and other information shown in the Options menu ÃÄTHeaderSection Date and time of the project ÃÄTTransferSection Information shown in Options|Transfer ÃÄTNoteSection Contents of Window|Project note ÃÄTModuleSection Contents of Project Window ÃÄTDependencySection Contents of Project|View includes ÀÄTExtensionSection Miscellaneous string contents of Project|Local Options, referenced by TModuleSection TSection's derived classes have member functions to access their unique data in the most convenient way. For example, TModuleSection::GetModule returns a pointer to a structure containing information on the specified module. TOptionSection::GetCompilerModel returns the setting of the memory model. The following table shows which examples explore a given section: _________________________________________________ Project Section PROX TRANCOPY STRIPPRJ OptionSection X HeaderSection X TransferSection X NoteSection X ModuleSection X X DependencySection X X ExtensionSection X _____________________ Using the examples To learn how to use the core classes, study the code in the project file utilities, and try the examples. With the source code, you can use the debugger to trace them. Start with PROX, a collection of small, separate functions that perform a variety of tasks. Use PROX.PRJ as your source. PROX's syntax is: PROX [options] [.PRJ] [options] Chapter 4 Page 86 Show overview (-o) Shows the file offset and size of each section. The Dependency section is missing until files are included during compilation. Show modules (-p) Shows each item seen in the Project Window, along with Local Options such as the output name, command line overrides, translator, and whether or not debug info is excluded. When used on a complete project file, it teaches how to access the Module section of a project file using TModuleSection. It also demonstrates that each module may have an index to the Extensions section, stored in TExtensionSection, which contains additional strings for the output path, command line overrides, and translator when used with a project that contains Local Options. Show modules with dependencies (-P) Same as -p except shows the include files (dependencies) of each module, stored in TDependencySection. Show options (-t) Displays memory model, prolog/epilog, paths, and other selected options stored in TOptionSection. Set options (-s) Modifies and writes memory model, prolog/epilog, paths, and other selected options stored in TOptionSection. Writes these changes to FOO.PRJ. You can open FOO.PRJ in the IDE to verify the modifications. However, do not use the project for actual work, as the options are not valid. Show note (-n) Shows Window | Project Note using TNoteSection. Show header (-h) Outputs the age of the project using THeaderSection. Shows the date and time in ASCII, not hexadecimal. TRANCOPY syntax TRANCOPY [-r] Using PROX helps you understand most of the project. However, PROX totally ignores TTransferSection. With TRANCOPY you can copy the transfer section of one project into another project. Without the -r option, the source section is nondestructively merged into the Chapter 4 Page 87 destination section. With the -r option, the previous transfer items are replaced. The TRANCOPY executable ships with both Turbo C++ for Windows and Borland C++. STRIPPRJ syntax STRIPPRJ STRIPPRJ removes include file information (the Dependency section) from a project. It covers the same areas as PROX -P and PROX -s. You can regenerate the Dependency section by performing Compile|Build all. Format of the Project file ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Header ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Option section ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Header section ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Transfer section ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Note section ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Module section ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Dependency section ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Extension section ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ -1 (0xFFFF) ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ If you use the Project file utilities you probably won't have to learn the Project file format. The class hierarchy does most of the work for you. The rest of this chapter documents the format for direct access. The first part of the .PRJ file is Header information used by the IDE to confirm the file's Chapter 4 Page 88 validity. The following seven sections differ in structure and kinds of information they contain. However, they each have a section header to identify Block Type and size of the data area. Viewing .PRJ files is difficult. You must carefully track offsets to be sure you have the right data. If you are just getting started, you might follow the example. First use PROX -o PROX.PRJ and record the offset for each section. Type TD to enter the Turbo Debugger IDE and choose View|File|PROX.PRJ. Header information variable length: VisibleIDString = "Turbo C Project File ^Z" String designed to display if the project file is listed to the screen (null terminated). 7 bytes: Signature = "01 0D 12 17 01 1A 00" ID number that the IDE verifies . 2 bytes: Version Unsigned version number that is written into the project file when it is created. For internal use. The version number changes whenever any change occurs in either the project file format or data. This version must match that held in the IDE, or the project manager will not accept the file. The current version is 0x0701. In the file, the number reads 01 07 due to byte swapping. Sections in the project file Each section begins with a section header as follows: 2 bytes: section Block Type identification number 2 bytes: size of the following data area in the section The Block Types are given here in decimal values. The size of block does not include the 4-byte header. Here are the sections that make up a project file. Chapter 4 Page 89 Block Type 50-- Options section ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ 2 ³ Block Type = 50 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 2 ³ Data size = n ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ 2 ³ ID 1 ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 2 ³ Option 1 size = x ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ x ³ Data for Option 1 ³ ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ ³ 2 ³ ID 2 ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 2 ³ Option 2 size = y ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ n y ³ Data for Option 2 ³ ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ ³ . . . ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 2 ³ ID = 0XFFFF ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 2 ³ Size = 0 ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ variable length data: array of structures. For each Options menu item: 2 bytes: Option ID 2 bytes: size of Option variable length data: value, data, or content of Option The structure for each Options menu item has a 4-byte header followed by the data, or content or the item. The last ID is 0xFFFF with a size of 0. You can write to Block Type 50 (32 00 in the file). Chapter 4 Page 90 Block Type 51-- Header section ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ 2 ³ Block Type = 51 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 2 ³ Size = 6 ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ 2 ³ Reserved ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 4 ³ Project age ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ 2 bytes: Reserved 4 bytes: Age of project file = seconds: 5 bits minutes: 6 bits hour: 5 bits day: 5 bits month: 4 bits year: 7 bits Block Type 51 (33 00 in the file) is used internally. Chapter 4 Page 91 Block Type 10-- Transfer section ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ 2 ³ Block Type = 10 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 2 ³ Size = n ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ 323 ³ Transfer 1 ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 323 ³ Transfer 2 ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ . . . n ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 323 ³ Transfer k (last) ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 323 ³ Transfer k + 1 ³ ³ ³ Translator = 0xFF ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ variable length data: array of structures. For each Options|Transfer item: 1 byte: translator[]; 1=true, 0=false, 0xFF is last; 40 bytes: transfer title (Name) 80 bytes: transfer exe name (Program path) 200 bytes: transfer command (Command line) 2 bytes: Hot key command After the header, the total number of bytes used is a multiple of 323 (depending on how many transfer items are included). You can write to Block Type 10 (0a 00 in the file). Block Type 52--Note section Chapter 4 Page 92 ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ 2 ³ Block Type = 52 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 2 ³ Size = n ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ n ³ ASCII text of note³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ variable length data after the header. You can edit the note in Block Type 52 (34 00 in the file). Block Type 53-- Module section ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ 2 ³ Block Type = 53 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 2 ³ Size = n ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ 108 ³ Module 1 ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 108 ³ Module 2 ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ . . . n ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 108 ³ Module k (last) ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 108 ³ Module k + 1 ³ ³ ³ ProjectItemType = ³ ³ ³ NoMoreItems ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ variable length data: each module represents an item in the Project Window, structured as follows: 2 bytes: ProjectItemType = reserved 0x0001 reserved 0x0002 Translator 0x0004 Chapter 4 Page 93 Overlay 0x0008 (Project window Options|Local Options) CommandLineOverride 0x0010 (Local Options) Exclude Debug info 0x0020 (Local Options) Exclude from link 0x0040 (Local Options) No more items 0x8000 (= 1, TRUE if is last item) 2 bytes: DependencyID index into Block Type 54 See Block Type 51 age bits. 4 bytes: Obj age (0 if not available) 4 bytes: Code Size (-1 if not available) 4 bytes: Data Size (-1 if not available) 2 bytes: number of lines 2 bytes: reserved: (= 0) 80 bytes: filename of item See Block Type 55 for use. 2 bytes: Options enum index into Block Type 55 (Local Options|Command-Line Options) 2 bytes: Translator Title index into Block Type 55 (Local Options|Translator) 2 bytes: OutputName index into Block Type 55 (Local Options output path) 2 bytes: Reserved You can write to unreserved parts of Block Type 53 (35 00 in the file). Block Type 54-- Dependency section Chapter 4 Page 94 ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ 2 ³ Block Type = 54 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 2 ³ Size = n ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ 2 ³ number of offsets = m+2 ³³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´P 2 ³0 ³a ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´r 2 ³offset 1 (index 2) ³t ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 2 ³offset 2 (index 3) ³1 ÃÄÄÄÄÄ . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄ´³ 2 ³offset m (index m+1) ³³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´³ 2 ³0xFFFF ³³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵³ . . . ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ 2 ³Type = 00 (from offset 1)³P ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´a 2 ³Number of dependencies ³r ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´t x ³Array of dependencies ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵2 . . . ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ 1 ³Type = FF (from offset 2)³³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´P 4 ³Age of dependency ³a ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´r y ³File name of dependency ³t ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ . . . 3 ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Chapter 4 Page 95 A memory manager creates the Dependency section containing pointers to include files, which is complex yet efficient. The data area starts after the 4-byte header. It consists of three variable length parts (basically offsets, indexes, and include files) as follows: Part 1. Offsets variable length data: array of 2-byte integers containing offsets from the beginning of the data area directly following the 4-byte header. The number of offsets is the first element. See the diagram for the rest of the array content. Part 2. Module dependencies variable length data: type, number of entries, and array of dependencies for each module in the project: 2 bytes: Type = 00 00 2 bytes: Number of dependency entries (multiple of 4) variable length data, for each dependency: 2 bytes: index to array of offsets in part 1. The last entry is -1. 4 bytes: age when dependency last compiled for this module. See Block Type 51 for age bits. Part 3. Dependency information variable length data: series of bytes containing type, age, and file name for each dependency: 1 byte: Type = FF 4 bytes: age of dependency (see Block Type 51 for age bits) variable length string: file name of dependency, NULL terminated You can write to unreserved parts of Block Type 54 (36 00 in the file). Here are some tips for tracking a dependency entry in a Project file, FILENAME.PRJ. Chapter 4 Page 96 Prepare as follows: 1. Run PROX -o FILENAME.PRJ to make note of the project file offsets of the Module and Dependency sections. 2. Enter TD and open the file under View|File| Open. Get the Dependency ID offset as follows: 1. Locate the Module section offset (35 00 value). 2. Count four bytes, skipping over the header. 3. Count two bytes, skipping over the Project item type. 4. Record the 2-byte Dependency ID offset. Find the Module dependency entry: 1. Locate the Dependency section offset (36 00 value). 2. Count four bytes to the start of the data area. 3. Count 2* Dependency ID offset to read the offset to the Module dependencies. See Part 1 on the diagram. 4. Return to the start of the data area. 5. Count off the Module dependencies offset. 6. You should be at a Type 00 00 location. See Part 2 on the diagram. Find the dependency information: 1. Skip over 4 bytes for the header. 2. Read the index. 3. Go to the beginning of the data area. 4. Count 2*index. 5. Read offset of Dependency information. Chapter 4 Page 97 6. Go to this offset. See Part 3 on the diagram. 7. Skip 5 bytes past the type and age data. 8. Read the file name (NULL terminated). For each dependency, read the index (separated from the previous one by 4 bytes of age data) and repeat steps 3-6. The part ends with 0xFFFF. Block Type 55-- Extension section ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ 2 ³ Block Type = 55 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 2 ³ Size = n ³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ 2 ³ number of offsets = m+2 ³³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´³ 2 ³0 ³P ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´a 2 ³offset 1 (index 2) ³r ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´t 2 ³offset 2 (index 3) ³ ÃÄÄÄÄÄ . . . ÄÄÄÄÄÄÄÄÄÄÄÄij1 2 ³offset m (index m+1) ³³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´³ 2 ³0xFFFF ³³ ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ . . . ÆÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ x ³String1 ³P ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´a y ³String2 ³r ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´t z ³String3 ³ ÃÄÄÄÄÄÄ . . . ÄÄÄÄÄÄÄÄÄÄÄÄ´2 zz³Stringm ³³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ This is entered with an index into an integer array, obtained from the Options, Translator Chapter 4 Page 98 Title, and OutputName fields of each module in Block Type 53. Here are some tips for tracking Options, Translator Title, and OutputName entries for a module in a Project file, FILENAME.PRJ. Prepare as follows: 1. Run PROX -o FILENAME.PRJ to make note of the project file offsets of the Module and Extension sections. 2. Enter TD and open the file under View|File| Open. Get the Options, Translator, and OutputName offsets as follows: 1. Locate the Module section offset (35 00 value). 2. Count 4 bytes, skipping the header. 3. Count 100 bytes. 4. Record the next three 2-byte Options, Translator, and OutputName offsets. Find the entries: 1. Locate the Extension section offset (37 00 value). 2. Count four bytes to the start of the data area. 3. Count 2* Options offset to read the offset to the string. See Part 1 on the diagram. 4. Return to the start of the data area. 5. Count off the string's offset. 6. Read the string. See Part 2 on the diagram. Chapter 4 Page 99 Chapter 5 Page 100 CHAPTER ________________________________________________________________________________ 5 The BGI driver toolkit The Borland Graphics Interface (BGI) is a fast, compact, and device-independent software package for graphics development built into the Turbo Pascal and Borland C++, language products. Device independence is achieved via loadable device-specific drivers called from a common kernel. In this chapter we describe basic BGI functionality, and how to create new device drivers. ________________________________________________________________________________ File Name File Description __________________________________________________ BH.C BGI loader header-building program source BH.EXE BGI loader header-building program executable DEVICE.INC Structure and macro definition file DEBVECT.ASM Vector table for sample (DEBUG) driver DEBUG.C Main module for sample driver MAKEFILE Build file BUILD.BAT A batch file for MAKE-phobics _____________________________________ BGI run-time architecture Programs produced by Borland languages create graphics via two entities acting in concert: the generic BGI Kernel and a device-specific driver. Typically, an application built with a Borland compiler will include several device driver files on the distribution disk (extension .BGI) so that the program can run on various types of screens and printers. Graphics requests (for example, draw line, draw bar, etc.) are sent by the application to the BGI Kernel, which in turn makes requests of the device driver to actually manipulate the hardware. A BGI device driver is a binary image; that is, a sequence of bytes without symbols or other linking information. The driver begins with a short header, followed by a vector table containing the entry points to the functions inside. The balance of the driver comprises the code and data required to manipulate the target graphics hardware. All code and data references in the driver must be near (i.e., small model, offset only), and the entire driver, both code and data, must fit within 64K. In Chapter 5 Page 101 use, the device driver can count on its being loaded on a paragraph boundary. The BGI Kernel uses a register-based calling convention to communicate with the device driver (described in detail below). BGI Graphics Model When considering the functions listed here, keep in mind that BGI performs most drawing operations using an implicit drawing or tracing color (COLOR), fill color (FILLCOLOR), and pattern (FILLPATTERN). For example, the PIESLICE call accepts no pattern or color information, but instead uses the previously set COLOR value to trace the edge of the slice, and the previously set FILLCOLOR and FILLPATTERN values for the interior. For efficiency, many operations take place at the position of the current pointer, or CP. For example, the LINE routine accepts only a single (x,y) coordinate pair, using the CP as the starting point of the line and the passed coordinate pair as the ending point. Many functions (LINE, to name one) affect CP, and the MOVE function can be used to explicitly adjust CP. The BGI coordinate system places the origin (pixel 0,0) at the upper left-hand corner of the screen. Header Section The device header section, which must be at the beginning of the device driver, is built using macro BGI defined in file DEVICE.INC. The BGI macro takes the name of the device driver to be built as an argument. For example, a driver named DEBUG would begin as shown here: CSEG SEGMENT PARA PUBLIC 'CODE' ; any segment naming may be used ASSUME DS:CSEG, CS:CSEG ; cs=ds CODESEG INCLUDE DEVICE.INC ; include the device.inc file BGI DEBUG ; declare the device header section The device header section declares a special entry point known as EMULATE. If the action of a device driver vector is not supported by the hardware of a device, the vector entry should contain the entry EMULATE. This will be patched at load time to contain a jump to the Kernel's emulation routine. These routines will emulate the action of the vector by breaking down the request into simpler primitives. For example, if the hardware has the functionality to draw arc, the arc vector will contain the address of the routine to dispatch the arc data to the hardware and would appear as follows: dw offset ARC ; Vector to the arc routine If, as is often the case, the hardware doesn't have the functionality to display arcs, the vector would instead contain the EMULATE vector: Chapter 5 Page 102 dw EMULATE The Kernel has emulation support for the following vectors: BAR Filling 3D rectangles ARC Elliptical arc rendering PIESLICE Elliptical pie slices FILLED_ELLIPSE Filled Ellipses The driver status table BGI requires that each driver contain a Driver Status Table (DST) to determine the basic characteristics of the device that the driver addresses. As an example, the DST for a CGA display is shown here: STATUS STRUC STAT DB 0 ; Current Device Status (0 = No Errors) DEVTYP DB 0 ; Device Type Identifier (must be 0) XRES DW 639 ; Device Full Resolution in X Direction YRES DW 199 ; Device Full Resolution in Y Direction XEFRES DW 639 ; Device Effective X Resolution YEFRES DW 199 ; Device Effective Y Resolution XINCH DW 9000 ; Device X Size in inches*1000 YINCH DW 7000 ; Device Y Size in inches*1000 ASPEC DW 4500 ; Aspect Ratio = (y_size/x_size) * 10000 DB 8h DB 8h ; for compatibility, use these values DB 90h DB 90h STATUS ENDS The BGI interface provides a system for reporting errors to the BGI Kernel and to the higher level code developed using Borland's language packages. This is done using the STAT field of the Driver Status Table. This field should be filled in by the driver code if an error is detected during the execution of the device installation (INSTALL). The following error codes are predefined in include file GRAPHICS.H for Turbo C and in the Graphics unit for Turbo Pascal. grOk = 0 Normal Operation, No errors grNoInitGraph = -1 grNotDetected = -2 grFileNotFound = -3 grInvalidDriver = -4 grNoLoadMem = -5 grNoScanMem = -6 grNoFloodMem = -7 grFontNotFound = -8 grNoFontMem = -9 grInvalidMode = -10 grError = -11 Generic Driver Error grIOerror = -12 Chapter 5 Page 103 grInvalidFont = -13 grInvalidFontNum = -14 grInvalidDeviceNum = -15 The next field in the Device Status Table, DEVTYP, describes the class of the device that the driver controls; for screen devices, this value is always 0. The next four fields, XRES, YRES, XEFRES, and YEFRES, contain the number of pixels available to BGI on this device in the horizontal and vertical dimensions, minus one. For screen devices, XRES=XEFRES and YRES=YEFRES. The XINCH and YINCH fields are the number of inches horizontally and vertically into which the device's pixels are mapped, times 1000. These fields in conjunction with XRES and YRES permit device resolution (DPI, or dots per inch) calculation. Horizontal resolution (DPI) = (XRES+1) / (XINCH/1000) Vertical resolution (DPI) = (YRES+1) / (YINCH/1000) The ASPEC (aspect ratio) field is effectively a multiplier/divisor pair (the divisor is always 10000) that is applied to Y coordinate values to produce aspect-ratio adjusted images (for example, round circles). For example, an ASPEC field of 4500 implies that the application will have to transform Y coordinates by the ratio 4500/10000 when drawing circles to that device if it expects them to be round. Individual monitor variations may require an additional adjustment by the application. The device driver vector table The routines in the device driver are accessed via a vector table. This table is at the beginning of the driver and contains 16-bit offsets to subroutines and configuration tables within the driver. The format of the vector table is shown below. VECTOR_TABLE: DW INSTALL ; Driver initialization and installation DW INIT ; Initialize device for output DW CLEAR ; Clear graphics device; get fresh screen DW POST ; Exit from graphics mode, unload plotter DW MOVE ; Move Current Pointer (CP) to (X,Y) DW DRAW ; Draw Line from (CP) to (X,Y) DW VECT ; Draw line from (X0,Y0) to (X1,Y1) DW EMULATE ; Reserved, must contain Emulate vector DW BAR ; Filled 3D bar from (CP) to (X,Y) DW PATBAR ; Patterned rectangle from (X,Y) to (X1,Y1) DW ARC ; Define ARC DW PIESLICE ; Define an elliptical pie slice DW FILLED_ELLIPSE ; Draw a filled ellipse DW PALETTE ; Load a palette entry DW ALLPALETTE ; Load the full palette DW COLOR ; Set current drawing color/background DW FILLSTYLE ; Filling control and style Chapter 5 Page 104 DW LINESTYLE ; Line drawing style control DW TEXTSTYLE ; Hardware Font control DW TEXT ; Hardware Draw text at (CP) DW TEXTSIZ ; Hardware Font size query DW RESERVED ; Reserved DW FLOODFILL ; Fill a bounded region DW GETPIX ; Read a pixel from (X,Y) DW PUTPIX ; Write a pixel to (X,Y) DW BITMAPUTIL ; Bitmap Size query function DW SAVEBITMAP ; BITBLT from screen to system memory DW RESTOREBITMAP ; BITBLT from system memory to screen DW SETCLIP ; Define a clipping rectangle DW COLOR_QUERY ; Color Table Information Query ; ; 35 additional vectors are reserved for Borland's future use. ; DW RESERVED ; Reserved for Borland's use (1) DW RESERVED ; Reserved for Borland's use (2) DW RESERVED ; Reserved for Borland's use (3) . . . DW RESERVED ; Reserved for Borland's use (33) DW RESERVED ; Reserved for Borland's use (34) DW RESERVED ; Reserved for Borland's use (35) ; ; Any vectors following this block may be used by ; independent device driver developers as they see fit. ; Vector Descriptions The following information describes the input, output, and function of each of the functions accessed through the device vector table. dw offset INSTALL ; device driver installation The Kernel calls the INSTALL vector to prepare the device driver for use. A function code is passed in AL. The following function codes are defined: >>> Install Device: AL = 00 Input: CL = Mode Number for device Return: ES:BX --> Device Status Table (see STATUS structure) The INSTALL function is intended to inform the driver of the operating parameters that will be used. The device should not be switched to graphics mode (see INIT). On input, CL contains the mode in which the device will operate. (refer to BGI setgraphmode statement) Chapter 5 Page 105 The return value from the Install Device function is a pointer to a Device Status Table (described earlier). >>> Mode Query: AL = 001h Input: Nothing Return: CX The number of modes supported by this device. The MODE QUERY function inquires about the maximum number of modes supported by this device driver. >>> Mode Names: AL = 002h Input: CX The mode number for the query. Return: ES:BX --> a Pascal string containing the name The MODE NAMES function inquires about the ASCII form of the mode number present in CX. The return value in ES:BX points to a Pascal string describing the given mode. (Note: A Pascal, or _length_, string is a string in which the first byte of data is the number of characters in the string, followed by the string data itself.) To ease access to these strings from C, the strings should be followed by a zero byte, although this zero byte should not be included in the string length. The following is an example of this format: NAME: db 16, '1280 x 1024 Mode', 0 ================================================================== DW offset INIT ; Initialize device for output Input: ES:BX --> Device Information Table Return: Nothing This vector changes an already INSTALLed device from text mode to graphics mode. This vector should also initialize any default palettes and drawing mode information as required. The input to this vector is a device information table (DIT). The format of the DIT is shown below and contains the background color and an initialization flag. If the device requires additional information at INIT time, these values can be appended to the DIT. There in no return value for this function. If an error occurs during device initialization, the STAT field of the Device Status Table should be loaded with the appropriate error value. ; ************** Device Information Table Definition ************** Chapter 5 Page 106 struct DIT DB 0 ; Background color for initializing screen DB 0 ; Init flag; 0A5h = don't init; anything ; else = init DB 64 dup 0 ; Reserved for Borland's future use ; additional user information here DIT ends ================================================================== DW offset CLEAR ; Clear the graphics device Input: Nothing Return: Nothing This vector clears the graphics device to a known state. In the case of a CRT device, the screen is cleared. In the case of a printer or plotter, the paper is advanced, and pens are returned to the station. DW offset POST ; Exit from graphics mode Input: Nothing Return: Nothing This routine closes the graphics system. In the case of graphics screens or printers, the mode should be returned to text mode. For plotters, the paper should be unloaded and the pens should be returned to station. DW offset MOVE ; Move the current drawing pointer Input: AX the new CP x coordinate BX the new CP y coordinate Return: Nothing Sets the Driver's current pointer (CP) to (AX,BX). This function is used prior to any of the TEXT, ARC, SYMBOL, DRAW, FLOODFILL, BAR, or PIESLICE routines to set the position where drawing is to take place. DW offset DRAW ; Draw a line from the (CP) to (X,Y) Input: AX The ending x coordinate for the line Chapter 5 Page 107 BX The ending y coordinate for the line Return: Nothing Draws a line from the CP to (X,Y). The current LINESTYLE setting is used. The current pointer (CP) is updated to the line's endpoint. DW VECT ; Draw line from (X1,Y1) to (X2,Y2) Input: AX X1; The beginning X coordinate for the line BX Y1; The beginning Y coordinate for the line CX X2; The ending X coordinate for the line DX Y2; The ending Y coordinate for the line Return: Nothing Draws a line from the (X1,Y1) to (X2,Y2). The current LINESTYLE setting is used to draw the line. Note: CP is NOT changed by this vector. DW BAR ; fill and outline rectangle (CP),(X,Y) Input: AX X--right edge of rectangle BX Y--bottom edge of rectangle CX 3D = width of 3D bar (ht := .75 * wdt); 0 = no 3D effect DX 3D bar top flag; if CX <> 0, and DX = 0, draw a top Return: Nothing Fills and outlines a bar (rectangle) using the current COLOR, FILLCOLOR, and FILLPATERN. The current pointer defines the upper left corner of the rectangle and (X,Y) is lower right. An optional 3D shadow effect (intended for business graphics programs) is obtained by making CX nonzero. DX then serves as a flag indicating whether a top should be drawn on the bar. DW PATBAR ; fill rectangle (X1,Y1), (X2,Y2) Input: AX X1--the rectangle's left coordinate BX Y1--the rectangle's top coordinate CX X2--the rectangle's right coordinate DX Y2--the rectangle's bottom coordinate Return: Nothing Fills (but doesn't outline) the indicated rectangle with the current fill pattern and fill color. Chapter 5 Page 108 DW ARC ; Draw an elliptical arc Input: AX The starting angle of the arc in degrees (0-360) BX The ending angle of the arc in degrees (0-360) CX X radius of the elliptical arc DX Y radius of the elliptical arc Return: Nothing ARC draws an elliptical arc using the (CP) as the center point of the arc, from the given start angle to the given end angle. To get circular arcs the application (not the driver) must adjust the Y radius as follows: YRAD := XRAD * (ASPEC / 10000) where ASPEC is the aspect value stored in the DST. DW PIESLICE ; Draw an elliptical pie slice Input: AX The starting angle of the slice in degrees (0-360) BX The ending angle of the slice in degrees (0-360) CX X radius of the elliptical slice DX Y radius of the elliptical slice Return: Nothing PIESLICE draws a filled elliptical pie slice (or wedge) using CP as the center of the slice, from the given start angle to the given end angle. The current FILLPATTERN and FILLCOLOR is used to fill the slice and it is outlined in the current COLOR. To get circular pie slices, the application (not the driver) must adjust the Y radius as follows: YRAD := XRAD * ASPEC / 10000 where ASPEC is the aspect value stored in the driver's DST. DW FILLED_ELLIPSE ; Draw a filled ellipse at (CP) Input: AX X Radius of the ellipse BX Y Radius of the ellipse Return: Nothing This vector draws a filled ellipse. The center point of the ellipse is assumed to be at the current pointer (CP). The AX Register contains the X Radius of the ellipse, and the BX Register contains the Y Radius of the ellipse. Chapter 5 Page 109 DW PALETTE ; Load a color entry into the Palette Input: AX The index number and function code for load BX The color value to load into the palette Return: Nothing The PALETTE vector loads single entries into the palette. The register AX contains the function code for the load action and the index of the color table entry to be loaded. The upper two bits of AX determine the action to be taken. The table below tabulates the actions. If the control bits are 00, the color table index in (AX AND 03FFFh) is loaded with the value in BX. If the control bits are 10, the color table index in (AX AND 03FFFh) is loaded with the RGB value in (Red=BX, Green=CX, and Blue=DX). If the control bits are 11, the color table entry for the background is loaded with the value in BX. Control Bits Color Value and Index 00 Register BX contains color, AX is index 01 not used 10 Red=BX Green=CX Blue=DX, AX is index 11 Register BX contains color for background ================================================================== DW ALLPALETTE ; Load the full palette Input: ES:BX --> array of palette entries Return: Nothing The ALLPALETTE routine loads the entire palette in one driver call. The register pair ES:BX points to the table of values to be loaded into the palette. The number of entries is determined by the color entries in the Driver Status Table. The background color is not explicitly loaded with this command. DW COLOR ; Load the current drawing color. Input: AL The index number of the current drawing color AH The index number of the fill color Return: Nothing The COLOR vector determines the current drawing color. The value in AL is the index into the palette of the new current drawing color. The value in the AH Chapter 5 Page 110 register is the color index of the new fill color. All primitives are drawn with the current drawing color until the color is changed. The fill color is used for the interior color for the bar, polygons, pie slice, and floodfill primitives. ================================================================== DW FILLSTYLE ; Set the filling pattern Input: AL Primary fill pattern number ES:BX If pattern number is 0FFh, points to user-defined pattern mask. Return: Nothing Sets the fill pattern for drawing. The fill pattern is used to fill all bounded regions (BAR, POLY, and PIESLICE). The numbers for the predefined fill patterns are as follows: Code Description 8 Byte fill pattern 0 No Fill 000h, 000h, 000h, 000h, 000h, 000h, 000h, 000h 1 Solid Fill 0FFh, 0FFh, 0FFh, 0FFh, 0FFh, 0FFh, 0FFh, 0FFh 2 Line Fill 0FFh, 0FFh, 000h, 000h, 0FFh, 0FFh, 000h, 000h 3 Lt Slash Fill 001h, 002h, 004h, 008h, 010h, 020h, 040h, 080h 4 Slash Fill 0E0h, 0C1h, 083h, 007h, 00Eh, 01Ch, 038h, 070h 5 Backslash Fill 0F0h, 078h, 03Ch, 01Eh, 00Fh, 087h, 0C3h, 0E1h 6 Lt Bkslash Fill 0A5h, 0D2h, 069h, 0B4h, 05Ah, 02Dh, 096h, 04Bh 7 Hatch Fill 0FFh, 088h, 088h, 088h, 0FFh, 088h, 088h, 088h 8 XHatch Fill 081h, 042h, 024h, 018h, 018h, 024h, 042h, 081h 9 Interleave Fill 0CCh, 033h, 0CCh, 033h, 0CCh, 033h, 0CCh, 033h 10 Wide Dot Fill 080h, 000h, 008h, 000h, 080h, 000h, 008h, 000h 11 Close Dot Fill 088h, 000h, 022h, 000h, 088h, 000h, 022h, 000h 0FFh User is defining the pattern of the fill. In the case of a user-defined fill pattern, the register pair ES:BX point to 8 bytes of data arranged as a 8x8 bit pattern to be used for the fill pattern. DW LINESTYLE ; Set the line drawing pattern Input: AL Line pattern number BX User-defined line drawing pattern CX Line width for drawing Return: Nothing Chapter 5 Page 111 Sets the current line-drawing style and the width of the line. The line width is either one pixel or three pixels in width. The following table defines the default line styles: Code Description 16 Bit Pattern AL = 0 Solid Line Style 1111111111111111B AL = 1 Dotted Line 1100110011001100B AL = 2 Center Line 1111110001111000B AL = 3 Dashed line 1111100011111000B AL = 4 User-defined line style If the value in AL is four, the user is defining a line style in the BX register. If the value in AL is not four, then the value in register BX is ignored. DW TEXTSTYLE ; Hardware text style control Input: AL Hardware font number AH Hardware font orientation 0 = Normal, 1 = 90 Degree, 2 = Down BX Desired X Character (size in graphics units) CX Desired Y Character (size in graphics units) Return: BX Closest X Character size available (in graphics units) CX Closest Y Character size available (in graphics units) The TEXTSTYLE vector defines the attributes of the hardware font for output. The parameters affected are the hardware font to be used, the orientation of the font for output, the desired height and width of the font output. All subsequent text will be drawn using these attributes. If the desired size is not supported by the current device, the closest available match to the desired size should be used. The return value from this function gives the dimensions of the font (in pixels) that will actually be used. For example, if the desired font is 8x10 pixels and the device supports 8x8 and 16x16 fonts, the closest match will be the 8x8. The output of the function will be BX = 8, and CX = 8. DW TEXT ; Hardware text output at (CP) Input: ES:BX --> ASCII text of the string CX The length (in characters) of the string. This function sends hardware text to the output device. The text is output to the device beginning at the (CP). The (CP) is assumed to be at the upper left of the string. Chapter 5 Page 112 DW TEXTSIZ ; Determine the height and width of text ; strings in graphics units. Input: ES:BX --> ASCII text of the string CX The length (in characters) of the string. Return: BX The width of the string in graphics units. CX The height of the string in graphics units. This function determines the actual physical length and width of a text string. The current text attributes (set by TEXTSTYLE) are used to determine the actual dimensions of a string without displaying it. The application can thereby determine how a specific string will fit and reduce or increase the font size as required. There is NO graphics output for this vector. If an error occurs during length calculation, the STAT field of the Device Status Record should be marked with the device error code. DW FLOODFILL ; Fill a bounded region using a flood fill Input: AX The x coordinate for the seed point BX The y coordinate for the seed point CL The boundary color for the Flood Fill Return: Nothing (Errors are returned in Device Status STAT field). This function is called to fill a bounded region on bitmap devices. The (X,Y) input coordinate is used as the seed point for the flood fill. (CP) becomes the seed point. The current FILLPATTERN is used to flood the region. DW GETPIXEL ; Read a pixel from the graphics screen Input: AX The x coordinate for the seed point BX The y coordinate for the seed point Return: DL The color index of the pixel read from the screen. GETPIXEL reads the color index value of a single pixel from the graphics screen. The color index value is returned in the DL register. DW PUTPIXEL ; Write a pixel to the graphics screen Input: AX The x coordinate for the seed point BX The y coordinate for the seed point DL The color index of the pixel read from the screen. Chapter 5 Page 113 Return: Nothing PUTPIXEL writes a single pixel with the the color index value contained in the DL register. DW BITMAPUTIL ; Bitmap Utilities Function Table Input: Nothing Return: ES:BX --> BitMap Utility Table. The BITMAPUTIL vector loads a pointer into ES:BX, which is the base of a table defining special case-entry points used for pixel manipulation. These functions are currently only called by the ellipse emulation routines that are in the BGI Kernel. If the device driver does not use emulation for ellipses, this entry does not need to be implemented. This entry was provided because some hardware requires additional commands to enter and exit pixel mode, thus adding overhead to the GETPIXEL and SETPIXEL vectors. This overhead affected the drawing speed of the ellipse emulation routines. These entry points are provided so that the ellipse emulation routines can enter pixel mode, and remain in pixel mode for the duration of the ellipse-rendering process. The format of the BITMAPUTIL table is as follows: DW offset GOTOGRAPHIC ; Enter pixel mode on graphics hardware DW offset EXITGRAPHIC ; Leave pixel mode on graphics hardware DW offset PUTPIXEL ; Write a pixel to graphics hardware DW offset GETPIXEL ; Read a pixel from graphics hardware DW offset GETPIXBYTE ; Return a word containing pixel depth DW offset SET_DRAW_PAGE ; Select page in which to draw primitives DW offset SET_VISUAL_PAGE ; Set the page to be displayed DW offset SET_WRITE_MODE ; XOR Line Drawing Control The parameters of these functions are as follows: GOTOGRAPHIC ; Enter pixel mode on the graphics hardware This function is used to enter the special Pixel Graphics mode. EXITGRAPHIC ; Leave pixel mode on the graphics hardware This function is used to leave the special Pixel Graphics mode. PUTPIXEL ; Write a pixel to the graphics hardware This function has the same format as the PUTPIXEL entry described previously. GETPIXEL ; Read a pixel from the graphics hardware Chapter 5 Page 114 This function has the same format as the GETPIXEL entry described previously. GETPIXBYTE ; Return a word containing the pixel depth This function returns the number of bits per pixel (color depth) of the graphics hardware in the AX register. SET_DRAW_PAGE ; Select alternate output graphics pages (if any) This function take the desired page number in the AL register and selects alternate graphics pages for output of graphics primitives. SET_VISUAL_PAGE ; Select the visible alternate graphics pages (if any) This function take the desired page number in the AL register and selects alternate graphics for displaying on the screen. SET_WRITE_MODE ; XOR Line drawing mode control. XOR Mode is selected if the value in AX is one, and disabled if the value in AX is zero. DW SAVEBITMAP ; Write from screen memory to system memory Input: ES:BX Points to the buffer in system memory to be written. ES:[BX] contains the width of the rectangle -1. ES:[BX+2] contains the heigth of the rectangle -1. CX The upper left X coordinate of the rectangle. DX The upper left Y coordinate of the rectangle. Return: Nothing The SAVEBITMAP routine is a block copy routine that copies screen pixels from a defined rectangle as specified by (SI,DI) - (CX,DX) to the system memory. DW RESTOREBITMAP ; Write screen memory to the screen. Input: ES:BX Points to the buffer in system memory to be read. ES:[BX] contains the width of the rectangle -1. ES:[BX+2] contains the heigth of the rectangle -1. CX The upper left X coordinate of the rectangle. DX The upper left Y coordinate of the rectangle. AL The pixel operation to use when transferring the image into graphics memory. Write mode for block writing. 0: Overwrite mode 1: XOR mode Chapter 5 Page 115 2: OR mode 3: AND mode 4: Complement mode Return: Nothing The RESTOREBITMAP vector loads screen pixels from the system memory. The routine reads a stream of bytes from the system memory into the rectangle defined by (SI,DI) - (CX,DX). The value in the AL register defines the mode that is used for the write. The following table defines the values of the available write modes: Pixel Operation Code Overwrite mode 0 Logical XOR 1 Logical OR 2 Logical AND 3 Complement 4 ================================================================== DW SETCLIP ; Define a clipping rectangle Input: AX Upper Left X coordinate of clipping rectangle BX Upper Left Y coordinate of clipping rectangle CX Lower Right X coordinate of clipping rectangle DX Lower Right Y coordinate of clipping rectangle Return: Nothing The SETCLIP vector defines a rectangular clipping region on the screen. The registers (AX,BX) - (CX,DX) define the clipping region. DW offset COLOR_QUERY ; Device Color Information Query This vector inquires about the color capabilities of a given piece of hardware. A function code is passed into the driver in AL. The following function codes are defined: >>> Color Table Size AL = 000h Input: None: Return: BX The size of the color lookup table. CX The maximum color number allowed. Chapter 5 Page 116 The COLOR TABLE SIZE query determines the maximum number of colors supported by the hardware. The value returned in the BX register is the number of color entries in the color lookup table. The value returned in the CX register is the highest number for a color value. This value is usually the value in BX minus one; however, there can be exceptions. >>> Default Color Table AL = 001h Input: Nothing Return: ES:BX --> default color table for the device The DEFAULT COLOR TABLE function determines the color table values for the default (power-up) color table. The format of this table is a byte containing the number of valid entries, followed by the given number of bytes of color information. Device driver construction particulars The source code for a sample, albeit unusual, BGI device driver is included with this Toolkit to assist developers in creating their own. The demonstration driver is provided in two files, DEBVECT.ASM and DEBUG.C. This "Debug" driver doesn't actually draw graphics, but instead simply sends descriptive messages to the console screen (via DOS function call 9) upon receiving commands. Instead of simply playing back commands, your own driver would be structured similarly, but would access control ports and screen memory to perform each function. Cookbook 1. Compile or assemble the files required. 2. Link the files together, making sure that the device vector table is the first module within the link. 3. Run EXETOBIN on the resulting .EXE or .COM file to produce a .BIN file. There should be no relocation fixups required. 4. Run program BH (provided with the toolkit) on the .BIN file to produce the .BGI file. The resulting driver is now ready for testing. Examine the file TEST.C for an example of installing, loading, and calling a newly created device driver. Chapter 5 Page 117 Examples ; To call any BGI function from assembly language, include the ; structure below and use the CALLBGI macro. CALLBGI MACRO P MOV SI,$&P ; PUT OPCODE IN (SI) CALL CS:DWORD PTR BGI_ADD ; BGI_ADD POINTS TO DRIVER ENDM ; e.g., to draw a line from (10,15) to (200,300): MOV AX, 10 MOV BX, 15 MOV CX, 200 MOV DX, 300 CALLBGI VECT ; To index any item in the status table, include the status table ; structures below and use the BGISTAT macro. BGISTAT MACRO P ; GET ES: --> BGI STATUS LES SI, CS:DWORD PTR STABLE ; GET LOCATION OF STATUS TO SI ADD SI, $&P ; OFFSET TO CORRECT LOCATION ENDM ; e.g., to obtain the aspect ratio of a device: BGISTAT ASPEC MOV AX, ES:[SI] ; (AX)= Y/X *10000 Chapter 5 Page 118 CHAPTER ________________________________________________________________________________ 6 Borland Help system This chapter defines the Borland Help system, including the source text file format, binary Help file format, and the run-time Help engine, all of which are necessary to support the following features: Resizable Help display window. Automatic wordwrapping during window resizing. Smooth scrolling between logically connected Help screens. Turbo Examples. Free moving cursor. How do I use it? You can use the information provided in this chapter to write Help for your own products. The Help Linker (HL.EXE) is provided on the disk that accompanies this book. The Help files it produces are compatible with THELP.COM, a utility provided with most Borland compilers. If you provide third-party libraries, you might want to offer reference material for those libraries in Borland Help so your customers can find information on your routines as easily as they do with Borland's own. Wordwrap The right margin for wrapping is based on the window width, and is independent of where the text is relative to the window. This means scrolling text horizontally through the window will not cause re-wrapping; only resizing the window causes re-wrap. The value specified in field leftMargin of the binary file File Header Record is also applied to the right edge of the window when determining the right margin for wrapping, but not for truncation of non- Chapter 6 Page 119 wrapping text. Non-wrapping text is truncated at the physical right edge of the window. Wrapping causes lines to move into and out of the display window at the bottom of the window only. It never affects lines above the wrapping line. All hyphenated words in wrappable text must be removed from the Help source text. Here are the rules for wrapping at run time (breaking a line into two or more lines when the Help display window is too narrow to display the complete line): For a line of Help text to be wrappable, it must begin with non-whitespace. Wrapping only occurs at whitespace, and leaves whitespace behind at the end of the wrapping line. For the purpose of wrapping, a keyword is treated as atomic, even if it contains whitespace. A line isn't wrapped if only whitespace is truncated from the right to fit the current window width. A line is truncated on the right (like nonwrapping text) if it doesn't contain whitespace that allows it to wrap. Here are rules for converting hard returns to soft returns (allowing text to flow from the next line to fill the current line to the right margin): A return at the end of a line that begins with whitespace is always hard. If the next character following a return (first character of next line) is non- whitespace, then the return is soft; if the next character is whitespace, then the return is hard. These rules allow the existing Help text to wrap correctly with little or no change. Smooth scroll within topics All pages linked through the upContext and downContext fields of a keyword record are considered to be a single contiguous stream of text. Also, a single context (or screen) can contain any number of lines of text. Turbo Example copy A Turbo Example is a block of text in a Help screen that is set up for copying to the Clipboard. A single hot key copies the example to the Clipboard. Chapter 6 Page 120 Only one Turbo Example is allowed per Help topic, where a topic is defined as the set of all contexts (screens, pages) joined through the upContext/downContext fields of a keyword record. A Turbo Example is surrounded by ^E (0x05) characters in the context text. Keywords cannot be nested in Turbo Examples and vice versa. The text of a Turbo Example can extend over several contexts (screens, pages), and can include both wrapping and non-wrapping text. A special display attribute is defined to highlight Turbo Example text. When copying a Turbo Example to the Clipboard, wrapping text is converted to fixed text by replacing soft returns with hard returns. The line in the example text with the least amount of leading whitespace defines a left margin equivalent to this segment of leading whitespace. This left margin is deleted from all lines of the example text as it is copied to the Clipboard. Trailing whitespace is also deleted from all lines. For example, if the Turbo Example text is " void main( void ) { " " printf( "Hello world\n" ); " " }" this is what gets copied to the Clipboard: "void main( void ) {" "printf( "Hello world\n" );" "}" Summary of keyboard and mouse interaction Following is a summary of keyboard and mouse usage supported by the run-time engine while the Help window is active. UpArrow Moves cursor up one row in current column. If the cursor is already at the top of the window, scroll the text down one row in the window; if the cursor is at the top of the topic text, ignore the command. DownArrow Moves cursor down one row in current column. If the cursor is already at the bottom of the window, scroll the text up one row in the window; if at the bottom of the topic text, ignore the command. LeftArrow Chapter 6 Page 121 Moves cursor left one column on current row. If the cursor is already at the left edge of the window, scroll the text right horizontally by one column; if at the left edge of the topic text, ignore the command. RightArrow Moves cursor right one column on current row. If the cursor is already at the right edge of the window, scroll the text left horizontally by one column. The text can be scrolled left until column MaxHelpColumn is in the rightmost column of the Help window. CtrlLeftArrow Moves cursor left to the start of the previous word. A word is defined as a sequence of any of the following characters: (a..z), (A..Z), (0..9), or (_, $, #). If no further words remain on the current row, look for the word starting at the end of the previous row; if there's no previous row, ignore the command. Scroll the text in the window as necessary to keep the cursor in the window. CtrlRightArrow Like Ctrl Left, except moves the cursor right to the start of the next word. Home Moves cursor to first non-whitespace character of current row, scrolling the topic text horizontally in the window if necessary; if the row is all whitespace, move to column 1. End Moves cursor to one column past last non-whitespace character of current row, scrolling the topic text horizontally in the window if necessary. PgUp Scrolls topic text down in the window by the number of lines displayable in the window, or by the number of lines remaining to the top of the topic text, whichever is less. The cursor position is not affected. PgDn Scrolls topic text up in the window by the number of lines displayable in the window, or by the number of lines remaining to the bottom of the topic text, whichever is less. The cursor position is not affected. Shift If the Shift key is held down, and one or more sequences of the previous cursor control keys are pressed, a block of Help text will be selected. The block includes the character position at which the cursor was originally positioned, up to but not including the final resting position of the cursor. The block is Chapter 6 Page 122 highlighted as the cursor is moved. The block remains in effect until a cursor control key is pressed without the Shift key, or until it is copied to the Clipboard. Tab Selects the next keyword in the current topic text. If the last keyword in the topic is currently selected, then selects the first keyword in the topic. If there are no keywords in the topic, ignores the command. If the next keyword is not currently displayed in the window, scrolls the window horizontally and/or vertically to place the keyword text just inside the window. ShiftTab Like Tab, except selects previous keyword. Enter If a selected keyword is currently displayed in the Help window, switch to its context. If no keyword is displayed (even though one or more exist elsewhere in the topic text), ignore the command. Any other key is used for incremental searching between keywords in the topic text. clicking Clicking moves the cursor to the mouse cursor position, and cancels selected text, if any. If the mouse cursor is on a keyword, the keyword becomes the active keyword. Shift clicking Shift+clicking causes the current block of selected text to be extended to the cursor position. double clicking Double clicking moves the cursor to the mouse cursor position, and cancels selected text, if any. If the cursor is not positioned on a keyword, then do an index search for the token the cursor is currently positioned on, and if a match is found, switch contexts. If the cursor is on a keyword, switch to the keyword's context. A "token" is defined the same as a word for cursor movements (see the description of Ctrl-Left.) right button No action is defined for the right mouse button in the Help window. dragging Chapter 6 Page 123 Dragging the mouse in the Help window is equivalent to moving the cursor with the arrow keys while depressing the Shift key; that is, it selects text while allowing horizontal and vertical scrolling. Scroll bars Scroll bars are supported in the usual manner for scrolling Help topic text within the window. F1 Switches to context specified by mainIndexScreen field of File Header Record. AltF1 If previous context recorded, switch to previous context, else switch to mainIndexScreen context. CtrlF1 If the cursor is not positioned on a keyword, then does index search for the token the cursor is currently positioned on and, if a match is found, switches contexts. If the cursor is on a keyword, switches to the keyword's context. A "token" is defined the same as a word for cursor movements (see description of Ctrl Left). Esc Closes the Help window. Menu options Two Edit menu options apply when Help is active: Copy copies the current selected text from the topic text to the Clipboard. If no text is currently selected, the command is disabled (grayed in the menu). The text is "unselected" after the copy. The rules for coercing text during a Turbo Example copy (noted earlier), also apply during a generalized copy to Clipboard. Copy Example copies the Turbo Example text from the current topic text, if any, to the Clipboard. If the current topic has no Turbo Example, the command is disabled (grayed in the menu). Incremental searching Incremental searching is supported for movement between keywords in topic text. Literal characters entered at the keyboard are matched against successive characters in the text of keywords, and the selected keyword is changed based on the characters entered. Backspace strips successive characters from the match string. Explicit cursor movements cancel the incremental search. Chapter 6 Page 124 Index context A special context code (;INDEX) is recognized by the Help system that maps onto an internally generated topic. The topic consists of all entries in the index table of the Help file; index entries are stored as keywords. The user can then use any of the normal means of moving between these index keywords, and switch to contexts referenced in the index table. Creating online Help text First and foremost rule: Any command that you use in the Help file must be immediately preceded by a semicolon (;). Letter case does not matter unless you're using the ;CASESENSE command. Second rule: You must put hard returns at the end of your lines. There are several (optional) initial setup commands that you can place at the beginning of your Help files. ;CASESENSE causes Help index entries and screen names to be case sensitive. ;STAMP places (a usually human readable) ID stamp in the Help file to identify file it as Help file. ;SIGNATURE places another ID stamp in the Help file. ;VERSION codes a version number into the Help file. Recommended practice is to include any of these setup commands into a separate file and always include that file first when you create Help. An example Here is an example of the typical commands you'd use in a single Help screen format: ;COMMENT I can place this here; it won't appear ;COMMENT when you bring up the Help file ;SCREEN waditdo Turbo Dictionary When you select one of the items on this menu, you can learn everything you've ever wanted to know about it until you think you're going to implode with knowledge. Your choices include: Note that these "^B"s are the actual ^B character (0x02). ^BAnnouncer ^B ^BArchitect ^B Chapter 6 Page 125 ^BGame show host^B ^BPlumber ^B You'll want to use this command after a particularly long night of partying when you need something titillating to keep you awake or possibly to fool some higher-up into thinking that you're really working. ;KEYWORD don ;KEYWORD art ;KEYWORD dailydouble ;KEYWORD potpourri ;INDEX Dictionary ;ENDSCREEN Here's an explanation of each command used in the previous example: ;COMMENT ;COMMENT is an optional command you can use when you want to make a note to yourself (or anyone else reading the file) about that particular Help screen (or anything else for that matter). There's no limit to how many ;COMMENTs you can put in a file. You can also use ;COMMENT to keep track of modifications and authors. Naturally, comment text doesn't appear in the final Help file. ;SCREEN ;SCREEN marks the beginning of each new Help screen. The ;SCREEN name given in this command names the screen that Help searches for when the user selects a keyword. (See the ;KEYWORD command, below.) ;KEYWORD ;KEYWORD is an optional command that defines which Help screen to bring up when the user selects the matching keyword. Basically, the associated keyword is a reference. Perhaps a better way to put it is to compare it with a similar use in an encyclopedia or thesaurus. In defining or explaining an entry, these reference books may highlight or capitalize other related entries, or tell you to See other related entry. When the user calls up Help, all keywords appear highlighted. You can move around the keywords using the Up arrow, Down arrow, Right arrow, and Left arrow keys. The keyword you're positioned on is highlighted; to select it, press Enter. Here's another example: ;SCREEN metaphysics Metaphysics Metaphysics is a branch of philosophy concerned with the ultimate nature of existence. Ontology (the study of the nature of being), cosmology, and philosophical theology are usually considered Chapter 6 Page 126 its main branches. The term comes from the metaphysical treatises of Aristotle, who presented the First Philosophy (as he called it) after the Physics. See also ^B Kant ^B ^B Fichte ^B ^B Schelling^B ^B Hegel ^B ;KEYWORD kant ;KEYWORD fichte ;KEYWORD schelling ;KEYWORD hegel ;INDEX Metaphysics ;ENDSCREEN ;SCREEN kant Kant 1724-1804 German philosopher, one of the greatest figures in the history of ^Bmetaphysics^B. Kant proposed that objective reality is known only insofar as it conforms to the essential structure of the knowing mind. Only objects of experience (phenomena) may be known, where things lying beyond experience (noumena) are unknowable, even though in some cases we assume a prior knowledge of them. The existence of such unknowable "things-in-themselves" can be neither confirmed nor denied, nor can they be scientifically demonstrated. ;KEYWORD metaphysics ;INDEX Kant ;ENDSCREEN Notice that screen metaphysics has four keywords: Kant, Fichte, Schelling, and Hegel. For the sake of brevity, only one screen connected to metaphysics has been shown--screen Kant. Note that we showed the ^B's as two separate characters, but they should actually be the ^B character: 0x02. Each keyword within the screen text is delimited by ^B's and has a matching ;KEYWORD command. (So the Help Linker knows which screen a given keyword is to bring up when selected.) Read the following section, "More about ^B's" for further explanation. Chapter 6 Page 127 This example shows the keywords formatted as a single column (which will wrap to multiple columns when the Help window is wide enough). You can also use keywords within the text of a paragraph. Whatever the keyword happens to be, your beginning and ending ^B's must be on the same line; the Help Linker gives an error if you try to wrap a keyword on two lines. ;ENDSCREEN ;ENDSCREEN ends the screen you began with ;SCREEN; there's no argument necessary. ;PAGE ;PAGE is a linking command between two or more Help screens of related information. Pressing PgUp takes you to the next screen; PgDn takes you to the previous screen. A good example of ;PAGE can be found on disk. Compiling and linking online Help Help linker command line syntax: hl {inputFile | @respFile} [/ooutFile] [/eerrorLimit] [/x] where [p] means p is optional. {p} means zero or more repetitions of p. p|q means choose p or q. Parameters can appear in any order. inputFile The name of a Help text file--any command line parameter not beginning with a "/" is assumed to be an input file specification, and any number can appear on the command line. If no path is specified, the file is taken from the current directory. @respFile respFile is the path/name of a response file containing the names of Help text input files. The file can specify any number of input files. Each file should be listed on a separate line in the file. Lines beginning with a semi-colon (;) are ignored and can be used for comments. If no path is specified, the file is taken from the current directory. Any number of response files can be specified on the command line; however, response files can not be nested. Note DOS file wildcards can be used in any inputFile specification, either on the command line or in a response file. Chapter 6 Page 128 /ooutFile outFile is the path/name of the file into which the compiled Help data is to be stored. If this parameter is missing, the data is stored in TCHELP.TCH in the current DOS work directory. /eerrorLimit errorLimit is the number of errors that need to be detected before the Help Linker will terminate without completing the link operation. If the parameter is missing, the Linker will terminate on any error. /x If this switch is present, the Help Linker will not automatically create and insert an index table screen in the resulting binary Help file. Since THELP automatically creates an index screen "on-the-fly," not including the /x switch will only result in a larger Help file. Binary Help file format The Binary Help File is comprised of a sequence of records. All records are mandatory, and the sequence of the records is significant. The records of the file are grouped into four major sections as follows: Administrative File Stamp File Signature File Version File Header Record Compression Record Context Table Index Table Context Descriptions: A series of 1 or more pairs of records: Text Record Keyword Record The administrative records help to identify the file as a valid Help file, and provide information necessary to interpret the remaining records of the file. The Context Table is a table defining every individually addressable "chunk" of Help text. Each Context is given a unique identification number which happens to be a direct index into the Context Table. The indexed element of the table gives an absolute offset into the Help file where a complete description of the context can be found. Chapter 6 Page 129 The Index Table is a sorted list of text labels, each with an associated Context Number. The Index Table allows Contexts to be referenced via a text label. The fourth and final area is the Context Descriptions. This is a list of one or more pairs of Text and Keyword Records. The Text Records give the actual text associated with each context, and they are directly addressed by the elements of the Context Table. All Text Records have an associated Keyword Record which defines linkage to other Contexts, as well as cross reference keywords embedded in the context text. Each file record type is described in detail in the remaining sections of this document. In the following sections, assume the following definitions: typedef unsigned char byte; typedef unsigned short word; File Stamp An ASCIIZ string identifying the file in "human readable" terms. For example, the following strings are used in Turbo C++ and Turbo Pascal respectively: TURBO C Help FILE.\0 TURBO PASCAL Help FILE.\0 The terminating null character is followed by a DOS End-of-File character (0x1A), so that a user attempting to "TYPE" the Help file under DOS will simply see the File Stamp string displayed. The text of this string is defined using the ;STAMP command in Help source text processed by the Help Linker. File Signature An ASCIIZ string helps to further identify a file as a valid Borland Help file. The string may be any value mutually agreed between the author of the Help text, and the programmer of the run-time code. The value currently used by Borland language products is: $*$* &&&&$*$ The text of this string is defined by the ;SIGNATURE command in Help source text processed by the Help Linker. File Version Two bytes that define the version of the Help Format, and of the Help File Text, respectively: typedef struct { byte formatVersion; byte textVersion; Chapter 6 Page 130 } TPversionRec; formatVersion defines the version of the Help file format. It allows the run- time code to test that its reader is capable of reading the Help file. This version code is hard-coded into both the Help Linker and the run-time code, and is updated when the file format is revised. The format defined in this document requires that field formatVersion be set to 52. Field textVersion defines the version of the text (i.e contents) of the Help file. The value is set using command ;VERSION in the Help source text processed by the Help Linker. The run-time code of Borland language products currently ignore this value. Record Headers The remaining records of a Help file have a common format which includes a header identifying the record's type and its length: typedef struct { byte recType; word recLength; } TPrecHdr; Field recType is a code which identifies the record type. The following record types are currently defined, and each is explained in further detail in the sections which follow: enum { RT_FileHeader = 0, RT_Context = 1, RT_Text = 2, RT_Keyword = 3, RT_Index = 4, RT_Compression = 5 }; Field recLength gives the length of the contents of the record in bytes, not including the record header. The contents begin with the first byte following the header. Note that while this record structure allows for an arbitrary ordering of records within the file, the existing Borland language products assume a fixed record ordering, which is the same order used to describe the records in the following sections. File Header Record Defines various parameters and options common to the entire Help file. typedef struct { word options; Chapter 6 Page 131 word mainIndexScreen; word maxScreenSize; byte height; byte width; byte leftMargin; } TPfileHdrRec; options options is a bitmapped field that let you select various options. Only one is currently supported. OF_CaseSense (0x0004) If set, index tokens are listed in mixed case in the Index Record, and index searches should be case sensitive. If cleared, index tokens are all uppercase in the Index Record, and index searches should ignore case. Set by ;CASESENSE command in Help source text processed by the Help Linker. mainIndexScreen The context number of the context designated by the ;MAININDEX command in the Help source text processed by the Help Linker. If ;MAININDEX wasn't used, mainIndexScreen is set to zero. maxScreenSize The number of bytes in the longest Text Record in the file (not including its header). This field is not currently used. height, width The default size in rows and columns, respectively, of the display area of a Help window. Set using the ;HEIGHT and ;WIDTH commands in Help source text processed by the Help Linker. leftMargin The number of columns to leave blank on the left edge of all rows of Help text displayed. Set using the ;LMARGIN command in Help source text processed by the Help Linker. Compression Record Defines how the contents of Text Records are encoded. The record has the following general form: typedef struct { byte compType; byte charTable[ 14 ]; Chapter 6 Page 132 } TPcompRec; compType is a code that identifies the type of compression used. Nibble encoding (CT_Nibble) is the only compression method currently supported. enum { CT_Nibble = 2 }; The text of a Text Record is encoded as a stream of nibbles. The nibbles are stored sequentially in the bytes of the text record; the low nibble of a byte logically precedes the high nibble of the byte in the nibble stream. Nibble values (0x0...0xD) are direct indexes into the charTable field of the Compression Record. The indexed entry is the literal character represented by the nibble. Obviously, the Help Linker chooses the 14 most frequent characters for inclusion in this table. One exception is that element 0 of this table always maps to a byte value of 0. The remaining two nibble values have special meanings: enum { NC_RawChar = 0xF, NC_RepChar = 0xE }; Nibble code NC_RawChar introduces two additional nibbles which define a literal character; the least significant nibble appears first. Nibble code NC_RepChar defines a repeated sequence of a single character. The next nibble gives the repeat count less two (i.e. counts from 2 to 17 are possible). The next nibbles define the character to repeat; the repeat character may be either a single nibble in the range (0x0 .. 0xD) representing an index into charTable, or it may be represented by a three nibble NC_RawChar sequence. Context table A table of absolute file offsets which relates Help contexts with their associated text. The first word of the record gives the number of contexts in the table. The remainder of the record is a table of n (n given by first word) 3-byte integers (LSByte first). The table is indexed by context number (0 to n-1). The 3-byte integer at a given index is an absolute byte offset in the Help file where the text of the associated context begins. The 3 byte integer is signed (2's complement). Two special values are defined: -1 Use Index Screen text - defined in File Header Record. -2 No Help is available for this context. Context Table entry 0 is not used. Chapter 6 Page 133 Index table A list of index descriptors. An index is a token (normally a word or name) that has been explicitly associated with a context using the ;INDEX command in the source text processed by the Help Linker. More than one index may be associated with a context, but any given index can not be associated with more than one context. The list of index descriptors in the Index Record allows the text of an index token to be mapped into its associated context number. The first word of the record gives the number of indexes defined in the record. The remaining bytes of the record are grouped into index descriptors. The descriptors are listed in ascending order based on the text of the index token (normal ASCII collating sequence). If the OF_CaseSense flag is not set in the option field of the File Header Record, all indexes are in uppercase only. Each index descriptor is of the following form: byte lengthCode; byte uniqueChars[ 1 .. n ]; word contextNumber; The bits of lengthCode are divided into two bit fields. Bits (7..5) specify the number of characters to carry over from the start of the previous index token string. Bits (4..0) specify the number of unique characters to add to the end of the inherited characters. Field uniqueChars gives the n unique characters to add. For example, if the previous index token was addition, and the next index token is advanced, we would inherit two characters from the previous token (ad), and add six unique characters (vanced); thus, lengthCode would be 0x46. contextNumber gives the context number of the context associated with the index. This number is an index into the Context Table described on page 133. Text Record Defines the compressed text of a context. Text Records and Keyword Records (see 134) appear in pairs; one pair for each context in the Help file. The Text Record always precedes its associated Keyword Record. Text Records are addressed in the Help file through file offset values found in the Context Table. The recLength field of the Text Record's header defines the number of bytes of compressed text in the record. The Compression Record defines how the text is compressed. If the text record is nibble encoded, and the last nibble of the last byte is not used, it is set to 0 - this translates to a 0 byte when the text is decoded, and the 0 byte represents a blank line. Chapter 6 Page 134 Lines of text comprising the Text Record are stored as ASCIIZ strings. Keyword Record Defines keywords embedded in the preceding Text Record, and identifies related Text Records. The record begins with the following fixed fields: word upContext; word downContext; word keywordCnt; upContext and downContext give the context numbers of the previous and next sections of text in a sequence, respectively. Either may be zero, indicating the end of the context chain. keywordCnt gives the number of keywords encoded in the associated Text Record. Immediately following this field is an array of keywordCnt Keyword Descriptor Records of the following form: typedef struct { word kwContext; } TPkwDesc; The keywords in a Text Record are numbered from 1 to keywordCnt in the order they appear in the text (reading left to right, top to bottom). kwContext is a context number (index into the Context Table) indicating which context to switch to if this keyword is selected by the user. Chapter 6 Page 135 CONTENTS ______________________________________________________________________ Introduction 1 Dynamically dispatchable virtual Why open architecture? . . . . 1 tables . . . . . . . . . . . . 23 Borland language tools . . . . 2 How to use this book . . . . . 2 Chapter 2 Object file Tools discussed . . . . . . . 2 contents 25 Accompanying software . . . . 3 Turbo object file comment A brief disclaimer . . . . . 3 records . . . . . . . . . . . 26 0x00 Compiler Chapter 1 C++ object mapping 5 identification . . . . . . . 26 Nonstatic data members . . . . 5 0xe0 External symbol type Nonvirtual base classes . . . . 5 index . . . . . . . . . . . 26 Virtual base classes . . . . . 6 0xe1 Public symbol type Empty classes . . . . . . . . 10 index . . . . . . . . . . . 27 Addressing of class instances and 0xe2 Structure member this . . . . . . . . . . . . 10 definition . . . . . . . . . 27 Virtual table pointers . . . 10 0xe3 Type definition . . . 29 Virtual tables . . . . . . . 11 Simple types . . . . . . . 32 Virtual function calls, virtual Pascal string type . . . . 32 thunks . . . . . . . . . . . 11 TID_PSTR . . . . . . . . 32 Calling conventions for member Labels . . . . . . . . . . 32 functions . . . . . . . . . . 11 TID_LABEL . . . . . . . 32 Pointers to class members . . 12 Integral range types . . . 32 Pointers to data members . 12 Cobol-style BCD . . . . . 33 Pointers to function TID_BCDCOB . . . . . . . 33 members . . . . . . . . . . 13 Pointer types . . . . . . 33 Static data members . . . . . 14 TID_NEAR and _export classes . . . . . . . 14 TID_NEAR386 . . . . . . 33 Passing classes by value . . 14 TID_FAR and TID_FAR386 . 33 Initialization and finalization TID_SEG . . . . . . . . 34 of nonlocal static objects . 14 TID_NREF . . . . . . . . 34 Conventions for constructors and TID_FREF . . . . . . . . 34 destructors . . . . . . . . . 14 Array types . . . . . . . 34 Constructors . . . . . . . 14 TID_CARRAY . . . . . . . 34 Destructors . . . . . . . . 15 TID_VLARRAY . . . . . . 34 RTL helper functions . . . . 15 TID_PARRAY . . . . . . . 34 Name mangling . . . . . . . . 18 Very large structure Encoding of nested and template types . . . . . . . . . . 35 classes . . . . . . . . . . 19 TID_VLSTRUCT and Encoding of function names . 19 TID_VLUNION . . . . . . 35 Ordinary functions . . . 19 Enumerated types . . . . . 35 Constructors, destructors, TID_ENUM and TID_PENUM . 35 and overloaded operators . 20 Function types . . . . . . 35 Type conversions . . . . 21 TID_FUNCTION . . . . . . 35 Encoding of arguments . . . 21 Sets . . . . . . . . . . . 36 TID_SET . . . . . . . . 36 i Binary files . . . . . . 36 0xf9 Debug Information TID_BFILE . . . . . . . 36 Version . . . . . . . . . . 47 Member/duplicate 0xfa Module optimization functions . . . . . . . . 36 flags . . . . . . . . . . . 47 TID_SPECIALFUNC . . . . 36 .OBJ extensions for 32 bits . 48 C++ Class . . . . . . . . 36 VIRDEF Records . . . . . . . . 49 TID_CLASS . . . . . . . 37 Pointed-to members . . . 37 Chapter 3 Symbol table format 51 TID_MEMBERPTR . . . . . 37 Symbols . . . . . . . . . . . 54 New style pointed-to Modules . . . . . . . . . . . 57 members . . . . . . . . . 37 Source files . . . . . . . . . 58 TID_NEWMEMBERPTR . . . 37 Line numbers . . . . . . . . . 59 0xe4 Enum member Scopes . . . . . . . . . . . . 60 definition . . . . . . . . 37 Segments . . . . . . . . . . . 60 0xe5 Begin scope record . 38 Segment/source file 0xe6 Locals definition correlations . . . . . . . . . 61 record . . . . . . . . . . 38 Types . . . . . . . . . . . . 62 SC_TYPEDEF (6) and SC_TAG Simple types and common (7) . . . . . . . . . . . 38 fields . . . . . . . . . . . 62 SC_STATIC (0) . . . . . . 39 Pascal strings (12 bytes) . 63 SC_ABSOLUTE (1) . . . . . 39 Ranges (24 bytes) . . . . . 63 SC_AUTO (2) and SC_PASVAR BCD COBOL (12 bytes) . . . . 64 (3) . . . . . . . . . . . 39 Pointers (12 bytes) . . . . 64 SC_REGISTER (4) . . . . . 39 C arrays (12 bytes) . . . . 64 SC_CONST (5) . . . . . . 39 Very large arrays (12 SC_OPT (8) . . . . . . . 40 bytes) . . . . . . . . . . . 65 SC_AUTO and SC_PASVAR . 40 Pascal arrays (24 bytes) . . 65 SC_REGISTER . . . . . . 40 Structs and unions (12 0xe7 End of scope . . . . 41 bytes) . . . . . . . . . . . 65 0xe8 Select source file . 41 Very large structs and unions 0xe9 Dependency file (24 bytes) . . . . . . . . . 65 definition . . . . . . . . 41 Enums (24 bytes) . . . . . . 66 0xea Compile parameters Functions (12 bytes) . . . . 66 record . . . . . . . . . . 42 Labels (12 bytes) . . . . . 66 0xeb External symbol matched Sets (12 bytes) . . . . . . 66 type index . . . . . . . . 43 Binary files (12 bytes) . . 66 0xec Public symbol matched Function prototypes type index . . . . . . . . 43 (24 bytes) . . . . . . . . . 67 0xed Class definition . . 44 Special functions (24 Class descriptions . . . 44 bytes) . . . . . . . . . . . 67 0xee Coverage offset Classes (12 bytes) . . . . . 68 record . . . . . . . . . . 45 Member pointers (24 bytes) . 68 0xf5 Begin large scope Near and far references record . . . . . . . . . . 45 (24 bytes) . . . . . . . . . 69 0xf6 Large offset locals Members . . . . . . . . . . . 75 definition record . . . . . 46 Structure and union SC_STATIC (0) . . . . . . 46 members . . . . . . . . . . 75 SC_ABSOLUTE (1) . . . . . 46 Class table . . . . . . . . . 76 SC_AUTO (2) and SC_PASVAR Special cases . . . . . . 78 (3) . . . . . . . . . . . 47 Parent table . . . . . . . . . 78 0xf7 Large end of scope . 47 Scope class table . . . . . . 78 0xf8 Member function . . . 47 Module class table . . . . . . 79 ii Coverage offset map table . . 79 Header Section . . . . . . . 102 Coverage offset table . . . . 80 The driver status table . . 103 Browser definition table . . 80 The device driver vector Optimized symbol table . . . 80 table . . . . . . . . . . . 104 Module Optimization Flags Table, Vector Descriptions . . . . 105 Reference Information Table . 81 Device driver construction Names . . . . . . . . . . . . 82 particulars . . . . . . . . 117 Debugging Turbo Pascal Cookbook . . . . . . . . . . 117 overlays . . . . . . . . . . 82 Examples . . . . . . . . . 118 Chapter 4 Project file format 85 Chapter 6 Borland Help Project file utilities . . . 85 system 119 How the utilities work . . 85 How do I use it? . . . . . . 119 Using the examples . . . . 86 Wordwrap . . . . . . . . . . 119 Show overview (-o) . . . 87 Smooth scroll within topics . 120 Show modules (-p) . . . . 87 Turbo Example copy . . . . . 120 Show modules with Summary of keyboard and mouse dependencies (-P) . . . . 87 interaction . . . . . . . . 121 Show options (-t) . . . . 87 Menu options . . . . . . . 124 Set options (-s) . . . . 87 Incremental searching . . . 124 Show note (-n) . . . . . 87 Index context . . . . . . . 125 Show header (-h) . . . . 87 Creating online Help text . 125 TRANCOPY syntax . . . . . 87 An example . . . . . . . . 125 STRIPPRJ syntax . . . . . 88 ;COMMENT . . . . . . . . . 126 Format of the Project file . 88 ;SCREEN . . . . . . . . . 126 Header information . . . . 89 ;KEYWORD . . . . . . . . . 126 Sections in the project ;ENDSCREEN . . . . . . . . 128 file . . . . . . . . . . . 89 ;PAGE . . . . . . . . . . 128 Block Type 50--Options Compiling and linking online section . . . . . . . . . 90 Help . . . . . . . . . . . . 128 Block Type 51--Header Binary Help file format . . 129 section . . . . . . . . . 90 File Stamp . . . . . . . . 130 Block Type 10--Transfer File Signature . . . . . . 130 section . . . . . . . . . 92 File Version . . . . . . . 130 Block Type 52--Note Record Headers . . . . . . 131 section . . . . . . . . . 92 File Header Record . . . . 131 Block Type 53--Module options . . . . . . . . 132 section . . . . . . . . . 93 OF_CaseSense (0x0004) . 132 Block Type 54--Dependency mainIndexScreen . . . . 132 section . . . . . . . . . 94 maxScreenSize . . . . . 132 Block Type 55--Extension height, width . . . . . 132 section . . . . . . . . . 98 leftMargin . . . . . . . 132 Compression Record . . . . 132 Chapter 5 The BGI driver Context table . . . . . . 133 toolkit 101 Index table . . . . . . . 133 BGI run-time architecture . . 101 Text Record . . . . . . . 134 BGI Graphics Model . . . . . 102 Keyword Record . . . . . . 135 iii